Open access peer-reviewed chapter

Automatic Generation of Game Content Customized by Players: Generate 3D Game Characters Based on Pictures

Written By

Hsuan-min Wang, Cheng-Lun Tsai and Chuen-Tsai Sun

Submitted: 22 May 2023 Reviewed: 29 May 2023 Published: 05 July 2023

DOI: 10.5772/intechopen.1002024

From the Edited Volume

Computer Science for Game Development and Game Development for Computer Science

Branislav Sobota and Emília Pietriková

Chapter metrics overview

55 Chapter Downloads

View Full Metrics

Abstract

With the advancements in hardware, deep learning, and the invention of generative adversarial networks (GANs), the integration of games and artificial intelligence (AI) has mostly focused on assisting game development, such as creating maps, skills, monsters, NPCs, and levels. In this chapter, a different approach is proposed, which utilizes artificial intelligence in games to allow players to customize game content. Based on this concept, a method for automatically generating 3D game character models using images is presented, specifically the Parallel PIFu (Pixel-Aligned Implicit Function) model. This method leverages the characteristics of the PIFu model and combines the features of the generated 3D models from the front and back views of the person captured in the images. By merging these features, the method produces 3D character models that preserve the details from both images without any missing body parts. This approach builds upon existing techniques for automatic generation of character models and further enhances them. It enables users to simply use their mobile phones to capture images of their desired characters, which can then be automatically transformed into corresponding 3D models. These models are compatible with most existing games on the market, allowing players to easily create personalized appearances for their in-game characters and enhance the overall gaming experience.

Keywords

  • AI in games
  • player customization
  • 3D game character models
  • immersive gaming experience
  • deep learning

1. Introduction

With the evolution of technology and advancements in hardware devices, the gaming industry has witnessed continuous innovation. From classic games like Snake and Space Invaders to modern titles such as Monster Hunter and League of Legends, the gameplay and content of games have expanded significantly. This is largely attributed to game designers who incorporate imaginative ideas to create unique and enjoyable gaming experiences for players. As artificial intelligence (AI) technology matures, game developers have started leveraging AI not only as non-player characters (NPCs) but also across various elements of the game, aiming to enhance player enjoyment and enrich the game’s content.

In order to enhance gameplay and player enjoyment, many game developers have shifted their focus from 2D games to 3D games. However, this also brings a challenge for game developers to find an effective way to generate a large number of 3D characters.

When converting 2D models to 3D models, developers need to address how to preserve the proportionate details of the characters and maintain realism and physicality during character movements. They also need to ensure consistency in the model from multiple viewing angles. Consequently, the process of 2D to 3D conversion involves a significant amount of modeling work. This not only consumes a considerable amount of time but also incurs corresponding costs in terms of hardware resources. Despite the existence of AI-generated techniques, most current research focuses on facial model modifications, and there is relatively less functionality available for full-body modeling. As a result, the templates provided for automatic adjustments in games mostly concentrate on facial features.

Currently, many games incorporate a step where players can create their own virtual characters before starting the beginner’s tutorial. This seemingly simple step is a crucial aspect of allowing players to bring their own experiences into the game. Recently, many games have introduced systems that allow players to freely adjust parameters, enriching the appearance possibilities in the game. However, at times, having too many adjustable parameters can lead to the problem of spending excessive time on character creation.

Therefore, this chapter proposes the concept of combining AI-generated content with player customization in games and, based on this concept, introduces a method for generating 3D game character models from images. The results demonstrate that this method can effectively generate 3D character models corresponding to casually captured photos, which are compatible with the majority of games on the market. It allows players to easily create their own character appearances in the game and provides game developers with a tool for 3D character modeling.

Advertisement

2. Literature review

In order to generate 3D characters that can be applied to games from images, this section involves body shape and posture prediction, 3D model generation, and human skeleton generation. The above-mentioned related technologies will be discussed below.

2.1 A Skinned Multi-Person Linear Model (SMPL)

The Skinned Multi-Person Linear Model (SMPL) [1] is a data-driven approach that uses a base human body model to simulate various body shapes and poses through extensive training. This model allows for deformations of the base model by utilizing low-dimensional parameters obtained through principal component analysis (PCA), which represents variations in height, weight, and other body attributes.

2.2 Human Mesh Recovery (HMR)

The Human Mesh Recovery (HMR) model [2] is a 2D-to-3D model that can reconstruct a 3D human model from input photos containing human subjects. The architecture of HMR integrates feature extraction, the SMPL model, and GAN models. HMR takes image features as inputs to the SMPL model, which then infers the 3D human model. The inferred model is projected back onto the 2D image to compute the loss. To ensure that the generated poses are realistic and satisfy the constraints of 3D joint angles, HMR employs a GAN discriminator to distinguish between the generated model and real human models. This helps improve the quality of the generated results, making them more consistent with real human movements. Additionally, during training, HMR does not require paired 2D images and their corresponding 3D models for supervised learning.

2.3 Generating 3D models of clothed characters from 2D images

In the realm of generating 3D models of clothed human bodies, notable advancements have been made, particularly with the Pixel-Aligned Implicit Function (PIFu) approach [3, 4, 5]. PIFu excels in inferring the surface details of single and multiple views of an object, showcasing outstanding performance.

There are two common representations for object surfaces: (1) explicit surfaces and (2) implicit surfaces. Explicit surfaces directly provide the points on the surface or use mapping relations to obtain them, while implicit surfaces do not explicitly represent the points but describe the relationship that all points on the surface satisfy. Each representation has its pros and cons. Explicit surfaces allow easy sampling of all points but require significant storage space and cannot determine the relationship between an arbitrary point and the surface. Implicit surfaces, on the other hand, can easily determine the relationship between any point and the surface, require less storage space, but are challenging to sample all points on the surface. In PIFu, an implicit function is utilized to define the object surface [6] due to its storage efficiency.

Although there have been previous attempts to generate 3D models by inferring object surfaces using implicit functions [7, 8, 9], purely relying on implicit functions cannot accurately capture the details in the input images. To address this, the concept of Pixel-Aligned is introduced. It involves incorporating the features extracted from the corresponding positions on the input images using a fully convolutional image encoder and a multi-layer perceptron (MLP) into the implicit function.

Compared to solely learning the implicit function in 3D space using global features, incorporating image features helps preserve the local details present in the images. However, PIFu has its limitations in practical applications. When reconstructing 3D models from a single view, although it avoids generating fragmented models, the reconstruction of the side profile of a person or the unobserved side not present in the photo is not ideal. When reconstructing from multiple views, while the overall reconstruction is more complete compared to a single view, it relies on using different-angle photos that have the same camera perspective and distance to the target object. Even slight perturbations can lead to less ideal 3D reconstructions, potentially missing important body parts such as hands and feet.

2.4 Automatic generation of 3D character skeletons

Automatic skeleton prediction can be divided into two main methods: (1) skeleton embedding and (2) skeleton extraction. In the skeleton embedding approach, the primary goal is to construct the geometric structure of the input model and define a penalty function for skeleton embedding. The predefined skeleton template is then embedded into the geometric structure to optimize the skeleton and obtain the final result. Pinocchio [10] is the most representative method using this approach. On the other hand, skeleton extraction is a more commonly used method for automatic skeleton prediction. It involves inferring the skeleton that fits the model using specific algorithms [11, 12, 13, 14] or deep learning models [15, 16].

Both methods have their own advantages and disadvantages. The advantage of skeleton embedding is that it fully utilizes prior knowledge of the skeleton and does not rely solely on geometric topology. For example, the ear region does not require skeleton binding. However, due to the distinct geometric features, the skeleton extraction method may generate skeleton structures that do not correspond to the intended movements. A notable drawback of skeleton embedding is the need for prior knowledge of the skeleton prototype corresponding to the 3D model actions. On the other hand, the advantage of skeleton extraction is that it can handle a wide range of 3D models as inputs, such as birds, dinosaurs, and humans. Consequently, most related research focuses on skeleton extraction. However, the downside is that it may generate skeleton structures that do not align with the intended movements due to prominent geometric features. Since this chapter focuses on human models, the skeleton embedding method will be adopted.

In this chapter, the integration and improvement of the aforementioned techniques are employed to generate 3D character models suitable for use in games. Firstly, the HMR and PIFu models are utilized to generate 3D human models with clothing. The HMR model estimates the 3D pose and shape of the human body from a single 2D image, while the PIFu model performs detailed 3D reconstruction of the human body, including clothing, based on a single 2D image. This enables the generation of high-quality 3D human models with clothing.

Next, the generated 3D models undergo skeleton embedding. The skeleton embedding method, mentioned earlier, is employed to embed a predefined skeleton template into the geometric structure of the 3D models. Optimization using penalty functions is applied to achieve the final skeleton structure.

Finally, the skeleton is bound to the 3D models, and animations are created using the built-in environment of Maya. Maya is widely used software for 3D modeling and animation, offering a rich set of tools and features for skeleton binding, keyframe animation setup, and smooth animation effects. By performing skeleton binding and animation setup in Maya, interactive 3D character models with realistic movements can be created. Ultimately, the generated 3D human models with animations are exported for direct usage in games, enhancing realism and interactivity within the gaming environment.

Advertisement

3. Methodology

The methodology employed in this chapter is a combined model based on three components: Human Mesh Recovery (HMR), Pixel-Aligned Implicit Function (PIFu), and Pinocchio. By using this integrated model, users can generate 3D animated models of individuals by providing front and back photos. This approach facilitates the easy and automated generation of customized game characters (Figure 1).

Figure 1.

The overall architecture of the model used in this chapter.

3.1 Generation of 3D character models

The process of generating 3D character models can be divided into three stages: parallel PIFu modeling, HMR modeling, and refinement of PIFu results using HMR.

3.1.1 Parallel PIFu model

To create 3D models while preserving the detailed information of the characters in the images, a parallel PIFu model is used in this section. The architecture is based on the single-view PIFu approach, which reconstructs 3D models from a single image. However, using a single image alone cannot capture the complete model. Therefore, a new approach called parallel single-view is proposed in this chapter. This method combines the features of two 3D models generated from the front and back images of a person, maintaining the details from both images and ensuring a complete representation of the body.

The RenderPeople dataset [17] is employed in this method, which provides 491 high-resolution clothed human body models. The training and testing sets consist of 442 and 49 models, respectively. In terms of data processing, since the training requires input images along with their corresponding 3D models, RenderPeople only provides the models without corresponding images. To address this, the models are placed at the center and rotated 360 degrees to render them into images. In other words, a total of 442 × 360 = 159,120 images are generated as training data. However, since the goal is to reconstruct real-world scene images, the rendered images need to consider the effects of lighting. To simulate lighting accurately, a radiance transfer technique (PRT) [18] is utilized to precompute the global illumination effects, resulting in rendered images that closely resemble real-world scenes.

After the initial training of the PIFu model is completed, the parallel PIFu model training is conducted. The features from the front and back models are combined after the PIFu output. Since the output format of this model is in OBJ format, there is no existing method for feature combination. Therefore, a simple method is designed in this section to handle feature combinations. First, both models are uniformly scaled and aligned to the center. Then, the models are segmented into multiple parts, and the features from the front and back are merged by finding corresponding points. This approach enables the generation of a 3D model that includes features from both front and back images.

3.1.2 HMR model

In this stage, the training data for the HMR model consists of 2D and 3D data. The 2D training data includes datasets such as LSP, LSP-extended [19], MPII [20], and MS COCO [21]. The purpose of using these datasets is to train the model to extract image features and generate 3D models that can be projected onto 2D models to compensate for the computational losses. The 3D training data includes Human3.6 M [22] and MPI-INF-3DHP [23], which are used to train the discriminator to distinguish between generated 3D models and real human models, and to generate corresponding 3D models.

The HMR model architecture consists of three parts: encoder, 3D regression, and discriminator. The encoder utilizes the ResNet-50 neural network architecture to extract features from the input images. The 3D regression part is a fully connected network that combines the image features and initial SMPL (Skinned Multi-Person Linear) body parameters. It generates body parameters that match the pose of the person in the image through iterative error feedback and calculates the loss by projecting the 3D joint nodes onto the image. The discriminator consists of shape discriminator and pose discriminator, which are used to determine whether the generated 3D model is real or fake. Finally, the adversarial loss is used to refine the parameters of the encoder.

3.1.3 Model refinement

Since the HMR model continuously refines the rotational deformation from the average body model parameters of SMPL, the generated model from HMR not only accurately represents the motion of the person in the image but also avoids the depth prediction errors that cause hand distortion as observed in PIFu. Leveraging this advantage, this method scales and aligns both models proportionally and fine-tunes the HMR model based on the average position of the hands in PIFu. Finally, the coordinate positions of the hand vertices in the HMR model are combined with the features from the parallel PIFu model to perform feature merging. This approach corrects the issue of hand distortion while preserving the original features of the vertices.

3.2 Skeleton generation

The 3D models generated in this chapter are all humanoid. Given the prior knowledge that the skeleton prototype is humanoid, it is ideal to use skeleton embedding to predict the skeleton positions. Therefore, this method employs the Pinocchio algorithm to predict the positions of the human body skeleton and output the coordinates of the joints and their corresponding relationships.

The Pinocchio algorithm can be divided into five main steps: (1) constructing the geometric structure graph that fits the model, (2) simplifying the input skeleton graph, (3) training the weights of different loss functions, (4) obtaining the optimal skeleton embedding result, and (5) refining the optimal result.

First, to obtain the geometric structure graph of the 3D model, the model is scaled to fit within a unit cube. Then, an octree structure is recursively applied to partition the model’s surface until it can fully describe the geometry. With the surface information, the medial surface is estimated by sampling points discretely, and then, the interior space of the model is filled by maximizing spheres centered at the points on the medial surface. Finally, one of the following conditions is used as a criterion: (1) if two spheres intersect, an edge is constructed by connecting their centers, or (2) if two spheres do not intersect but their centerline is located inside the model and constitutes an important edge in the skeleton structure, the centers of the spheres are connected to form the geometric structure graph of the model.

The second step is to optimize the computation of the skeleton embedding. In this step, the input skeleton is simplified by merging all the joints with a degree of 2. This reduces the number of joints from the original 18 to 7. The simplified skeleton is then embedded into the geometric structure graph. After the embedding is completed, the shortest paths between the joints on the geometric structure graph are computed. The simplified skeleton structure is then restored based on the original input skeleton’s proportions using interpolation.

The third step aims to achieve better skeleton embedding results. To do this, an appropriate loss function is defined for the skeleton embedding. Since a single loss function is not sufficient to cover all the constraints of the skeleton, nine different loss functions are defined [24]. These loss functions correspond to different constraints: (1) short bones: avoiding excessively short bones, (2) improper orientation between joints: ensuring consistency in the orientation of the input skeleton and the embedded skeleton, (3) length differences in bones marked symmetric: avoiding inconsistent bone lengths in symmetric pairs, (4) bone chains sharing vertices: avoiding bone chains that share vertices, (5) feet away from the bottom: ensuring that the joint nodes representing the feet are located at the bottom of the model, (6) zero-length bone chains: avoiding bone chains with zero length, (7) improper orientation of bones: ensuring consistency in the orientation of the restored simplified skeleton and the original skeleton, (8) degree-one joints not embedded at extreme vertices: ensuring that degree-one nodes are sufficiently far from their parent nodes, and (9) joints far along bone-chains but close in the graph: avoiding nodes that are far along the path but close in physical distance. The weights of these loss functions are trained using SVM, and the final loss function is obtained by linear combination.

Although the loss functions are defined, optimizing the loss function directly becomes challenging as the number of skeleton joints increases exponentially. Therefore, the branch and bound algorithm is used to estimate a lower bound for the skeleton embedding in advance, and then, the A-star algorithm is used to gradually add the embedded skeleton joints. Finally, the optimal skeleton embedding result with the minimal loss is obtained on the geometric structure graph.

After obtaining the optimal embedding result, the simplified skeleton with seven joints needs to be restored to 18 joints. The shortest paths between the joints in the geometric structure graph are found, and the restored skeleton is obtained by interpolating based on the proportions of the original input skeleton. Degree-2 joints are added to restore the same structure as the original embedded skeleton. Although the overall result is already good, there are still some local issues mainly due to the smaller weights assigned to smaller skeletons during the optimization process. Therefore, fine-tuning is required. A continuity optimization function is defined for this purpose. To facilitate fine-tuning, a continuous optimization function is defined as follows:

g(q1,qs)=αAgA(q1,qs)+i=2sgi(qi,qpS(i))

Where(q1,,,,qs),s=18 represents the embedded skeleton’s nodes,PS(i) denotes the parent node of the ith joint. The first termαAgA(q1,,,,qs) represents the penalty for asymmetry. The second term gi=αSgiS+αLgiL+αOgiO represents penalties for a skeleton being far from the central surface, having bones that are too short, and having a different direction from the original skeleton after embedding, respectively. The final result of skeleton embedding is obtained by fine-tuning through the optimization function described above.

3.3 Skeleton binding and output

The final step is to bind and output the 3D model with the skeleton. In this method, Maya is used for skeleton binding and exporting in the FBX format. The chosen method for skeleton binding is Geodesic Voxel Bind, which is widely used. After binding the skeleton to the 3D model, the model can be animated according to the user’s needs, based on a timeline sequence. The reason for using the FBX format for output is that it is a common model format used in various 3D modeling software and game engines available in the market.

Advertisement

4. Results

4.1 Results of the parallel PIFu model

In this stage, by incorporating feature fusion into the base PIFu model, we are able to address the limitation of single-view PIFu models, which only capture features from one side, as well as the model corruption issue that occurs in multi-view PIFu models when the photos have different camera perspectives and target object distances. The following are the results of this stage (Figure 2).

Figure 2.

Results of the parallel PIFu model.

4.2 Model refinement results

In this stage, the HMR (Human Mesh Recovery) model was employed to iteratively refine the rotational deformation of the average human body model, resulting in a reconstructed human body model that aligns with the body shape and movements observed in the input photos. This refined model was used to correct the distorted deformations in the hands observed in the parallel PIFu model. Additionally, a feature fusion approach was applied to preserve the distinctive features of the hands. The following are the results achieved in this stage (Figure 3).

Figure 3.

Results of model refinement.

4.3 Overall model results

In this stage, the final model generation was performed by integrating the parallel PIFu model, the refined model, and the Pinocchio model into an automated generation framework. This framework enables users to easily create 3D characters for game development. The following presents the final results obtained using our proposed method (Figure 4).

Figure 4.

Result of this chapter.

4.4 Limitations

Although the method presented in this chapter is capable of generating 3D models of characters, there are several limitations that need to be acknowledged.

  1. Restrictions on capturing poses: During the photo capturing process, the photographed person needs to maintain a T-pose with hands not touching the body and legs spread apart without merging. This is because the model interprets the connected regions between hands and body as a single entity, resulting in the generated character’s limbs sticking to the clothing without separate movements.

  2. Limitations in capturing fine details: While the overall reconstruction is satisfactory, limitations in hardware capabilities and limited training data prevent the use of high-resolution photos. As a result, the prediction of fine details may not be ideal.

  3. Restrictions related to clothing: The corrective model used in the approach is based on unclothed human body models. Therefore, it cannot accurately correct the distorted parts of the hands of characters wearing long-sleeved clothing. This means that the proposed model is only suitable for characters wearing short-sleeved garments.

These limitations indicate areas for potential future improvements and research focus, such as addressing pose restrictions, enhancing the prediction of fine details, and extending the corrective model to handle different types of clothing.

Advertisement

5. Conclusion

This chapter introduced a novel concept that combines the elements of automatic generation and customization in games, along with an automated model generation framework for customizing character appearances. With this framework, players can simply capture front and back photos of the desired character, and the system will automatically generate a character model that can be used in a wide range of 3D games on the market. Compared to existing appearance generation methods [25, 26], this chapter focused on generating full-body models rather than just focusing on facial features. Additionally, the output format has been improved to enhance the versatility of the player-generated custom models for most games.

The model generation process in this chapter consists of three stages. The first stage involves generating the 3D character model using the parallel PIFu model, which captures front and back features, and then correcting the hand distortion issues using the HMR model. The second stage focuses on generating the character skeleton using the Pinocchio model, which conforms to the 3D model generated in the first stage. The third stage involves skeleton binding and animation generation, where the 3D model and its skeleton from the previous stages are bound and animated in the Maya environment. Finally, the model is outputted in the widely used FBX format for most 3D games. Although additional processing may be required for character animations, automatic generation is already possible for characters with skeletons. In the future, with the integration of a wider range of diverse training data, this method holds the potential to not only contribute to the advancement of the gaming industry but also facilitate progress and enhancement in the animation and film sectors, where extensive 3D modeling is a critical component. By leveraging this approach, these industries can benefit from improved efficiency and creative possibilities.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics (TOG). 2015;34(6):248
  2. 2. Kanazawa A, Black MJ, Jacobs DW, Malik J. End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City Utah. IEEE; 2018. pp. 7122-7131
  3. 3. Alldieck T, Magnor M, Bhatnagar BL, Theobalt C, Moll GP. Learning to Reconstruct People in Clothing from a Single RGB Camera. arXiv preprint arXiv:1903.05885. 2019
  4. 4. Huang Z, Li T, Chen W, Zhao Y, Xing J, LeGendre C, et al. Deep volumetric video from very sparse multi-view performance capture. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich. Springer; 2018. pp. 336-354
  5. 5. Saito S, Huang Z, Natsume R, Morishima S, Kanazawa A, Li H. PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. IEEE International Conference on Computer Vision (ICCV). 2019;3:2304-2314
  6. 6. Sclaroff S, Pentland A. Generalized implicit functions for computer graphics. ACM SIGGRAPH Computer Graphics. 1991;25(4):247-250
  7. 7. Chen Z, Zhang H. Learning Implicit Fields for Generative Shape Modeling. arXiv preprint arXiv:1812.02822. 2019
  8. 8. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. arXiv preprint arXiv:1901.05103. 2019
  9. 9. Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A. Occupancy Networks: Learning 3D Reconstruction in Function Space. arXiv preprint arXiv:1812.03828. 2019
  10. 10. Bogo F, Romero J, Loper M, Black MJ. Dynamic FAUST: Registering human bodies in motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas Nevada. IEEE; 2016. pp. 2803-2812
  11. 11. Liu P-C, Wu F-C, Ma W-C, Liang R-H, Ouhyoung M. Automatic animation skeleton construction using repulsive force field. In: Proceedings of the 11th Pacific Conference on Computer Graphics and Applications. Canmore. IEEE Computer Society; 2003. pp. 409-413
  12. 12. Wade L, Parent RE. Automated generation of control skeletons for use in animation. The Visual Computer. 2002;18:97-110
  13. 13. Lee T-C, Kashyap RL. Building skeleton model via 3-D medial surface axis thinning algorithms. Conference on Computer Vision, Graphics, and Image Processing (CVGIP). 1994;56(6):462-478
  14. 14. Au OK-C, Tai C-L, Chu H-K, Or DC, Lee T-Y. Skeleton extraction by mesh contraction. ACM Transactions on Graphics. 2008;27(3):1-10
  15. 15. Xu Z, Zhou Y, Kalogerakis E, Singh K. Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets. arXiv preprint arXiv:1908.08506. 2019
  16. 16. Xu Z, Zhou Y, Kalogerakis E. RigNet: Neural rigging for articulated characters. ACM Transactions on Graphics. 2020;39(4):58:1-58:14
  17. 17. Renderpeople. [Online]. Availdable from: https://renderpeople.com/ [Accessed: June 01, 2023-06-01]
  18. 18. Sloan P-P, Kautz J, Snyder J. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. ACM Transactions on Graphics. 2020;21(3):527-536
  19. 19. Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation. British Machine Vision Conference (BMVC). 2010;2(4):5
  20. 20. Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus. IEEE; 2014. pp. 3686-3693
  21. 21. Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J. Microsoft COCO: Common Objects in Context. arXiv preprint arXiv:1405.0312. 2015
  22. 22. Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014;36(7):1325-1339
  23. 23. Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel H-P, et al. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. arXiv preprint arXiv:1705.01583. 2017
  24. 24. Baran I, Popovic´ J. Penalty Functions for Automatic Rigging and Animation of 3D Characters. In: Proceedings of the 2007 SIGGRAPH Conference. San Diego. ACM SIGGRAPH. 2007;26(3)
  25. 25. Shi T, Yuan Y, Fan C, Zou Z, Shi Z, Liu Y. Face-to-Parameter Translation for Game Character Auto-Creation. arXiv preprint arXiv:1909.01064. 2019
  26. 26. Lin J, Yuan Y, Zou Z. MeInGame: Create a Game Character Face from a Single Portrait. arXiv preprint arXiv:2102.02371. 2021

Written By

Hsuan-min Wang, Cheng-Lun Tsai and Chuen-Tsai Sun

Submitted: 22 May 2023 Reviewed: 29 May 2023 Published: 05 July 2023