Open access peer-reviewed chapter

Using Machine Learning Techniques to Discover Novel Thermoelectric Materials

Written By

Ebrar Yildirim and Övgü Ceyda Yelgel

Submitted: 10 September 2023 Reviewed: 24 September 2023 Published: 31 October 2023

DOI: 10.5772/intechopen.1003210

From the Edited Volume

New Materials and Devices for Thermoelectric Power Generation

Basel I. Abed Ismail

Chapter metrics overview

112 Chapter Downloads

View Full Metrics

Abstract

Thermoelectric materials can be utilized to build devices that convert waste heat to power or vice versa. In the literature, the best-known thermoelectrics, however, are based on rare, costly or even hazardous materials, limiting their general usage. New types of effective thermoelectric materials are thus required to enable worldwide deployment. Although theoretical models of transport characteristics can aid in the creation of novel thermoelectrics, they are currently too computationally costly to be used simply for high-throughput screening of all conceivable candidates in the wide chemical space. Machine learning (ML) has been viewed as a promising technique to aid materials design/discovery because of its quick inference time. In this book chapter, we provide the whole workflow for machine learning applications to the identification of novel thermoelectric materials, predicting electrical and thermal transport properties and optimizing processes for materials and structures using cutting-edge ML methods.

Keywords

  • thermoelectric materials
  • machine learning
  • thermoelectric efficiency
  • electrical and thermal transport
  • thermoelectric figure of merit
  • rapid materials discovery

1. Introduction

The largest factor in what occurred to alternative energy sources was the rise in global energy consumption, which has now elevated to the status of a major societal issue. Thermoelectric (TE) technologies, with their ability to convert energy, will be crucial for the development of renewable resources in the near future. More than 60% of the released energy is lost as waste heat, so thermoelectric materials used for reducing or recovering waste heat are the most indispensable materials as an alternative energy source [1, 2, 3]. TE materials: according to the scientific phenomena known as the “Seebeck Effect,” “Peltier Effect” and “Thomson Effect,” either the electric potential causes the temperature difference, or the temperature difference causes the electric potential. Although the TE effect is not zero in any material, it is too low to be useful in the vast majority of materials. The TE effect ensures that almost absent or rarely used heat energy is used as efficiently as possible. The thermoelectric effect has several benefits, such as being reliable, scalable, quiet, portable and ecologically beneficial. There are a variety of applications for thermoelectric materials, including refrigeration, waste heat recovery, energy conversion systems, infrared sensors, space missions, and research [1, 2, 3].

The thermoelectric efficiency coefficient “ZT a dimensionless value, indicates the performance of TE materials:

ZT=S2σκtotalTE1

where S is the Seebeck coefficient; T is the temperature and κtotal=κph+κc+κbp is the total thermal conductivity. Total thermal conductivity consists of three components: phonon doping κph, carrier doping (electron or hole) (κc) and bipolar doping (electron-hole pair) (κbp) [4]. Materials with high electrical conductivity and Seebeck coefficient and low overall thermal conductivity should be selected to have high TE performance and efficiency. Since these three physical quantities are related to physical laws (Boltzmann transport equation Sσ; Wiedemann-Franz law σκc), materials with these properties are very rare. Because of this, it is also rather difficult to determine these thermoelectric characteristics experimentally or theoretically.

The efficiency limit required for the widespread use of thermoelectric systems is to reach the ZT > 1 value, and thus, 10% conversion efficiency is achieved [5]. To address the current bottleneck of TE technology, rapid research and discovery of new TE materials with desired performance are required. Finding the best-performing thermoelectric material for various thermoelectric module layouts is the major problem, much like discovering many other energy technology potentials (such as solar panels, solid-state batteries and catalysts). Materials analysis and design have grown in popularity in recent times, with an emphasis on statistical modeling, development and the discovery of new materials with specific properties. Thanks to ab initio methods, the computation of material attribute bonds has been successful since the 1970s. It is possible to perform large-scale computations with the available computing power of supercomputer clusters. However, despite their enormous computational power, these processes still require hundreds or thousands of core years. Therefore, artificial intelligence (AI) technology appears to be one of the most promising and extremely important approaches in the discovery and design of next-generation materials. The data-driven approach phase of AI technology is when it learns from previous information to create prediction functions. In order to assist in the discovery, design and optimization of novel materials at this stage, data-driven techniques employ material knowledge from experimental data or high-throughput simulations. TE properties (S, σ, κtotal and ZT) can be accurately predicted using artificial intelligence techniques. The combination of big data and machine learning algorithms will be able to explore the potential of existing research data at the highest level and develop the necessary methods for the emergence of next-generation TE materials. Unfortunately, the number of studies on machine learning methods focusing on next-generation TE material discovery in the literature is quite limited so far. It should also be noted that although artificial intelligence techniques are more effective than experiments and theoretical computational methods, they severely constrain the creation and growth of reliable predictive models due to a lack of data and skewed data. In addition to that materials science includes various data categories such as organic materials, metal, semiconductor materials, etc. [6]. Unlike AI-powered fields such as image or natural language processing, the outputs of materials science cannot be easily obtained and verified which makes it difficult to obtain valuable datasets [7, 8, 9]. Material innovations played an important role in the science and technology revolution. Consideration should be given to AI technology as one of the most promising and important strategies for the development of educational programmes and the next generation of educational institutions. Data-driven approach habits resulting from grasping the predictive perspective by learning from existing features are now the dominant artificial intelligence technologies. Data-driven techniques leverage materials knowledge from experimental data or high-throughput calculations to aid material procurement, design and optimization [10]. Even if they are more effective than experiments and theoretical methods, AI still has a long way to go. One of the most significant barriers to improving materials science using AI technologies is a lack of data. The construction and development of sophisticated prediction models using AI are severely hampered by inefficient, skewed data. AI approaches may reliably anticipate the TE characteristics utilized in the dimensionless thermoelectric value of merit (such as electrical conductivity, thermal conductivity, and/or Seebeck coefficient) [11, 12, 13]. Furthermore, few studies have been conducted to identify representative TE material descriptors and AI models for predicting TE properties [10].

Thanks to the endless contribution of the TE foundation, the enormous observation obtained together with the research data makes it possible to reach the data-driven approach. The development of AI approaches has tremendously aided in the better utilization of these resources. The combination of big data and machine learning algorithms will be capable of exploring the full potential of existing research data and developing the procedures required for the creation of next-generation TE materials. Data-driven methods and dominant artificial intelligence technologies are in the data-driven approach stage. It is widely used in materials science to learn from existing data, predict mechanical properties and responses, develop predictive functions, scan materials and facilitate design. Materials information from experimental data or high-output calculations is used in data-driven methodologies to improve materials discovery, design and optimization. TE properties such as electrical conductivity, thermal conductivity and Seebeck coefficients, which are also used in dimensionless thermoelectric value coefficients, can be reliably predicted using artificial intelligence approaches. On the other hand, the amount of machine learning techniques research focusing solely on TE materials is relatively limited. In addition, several studies have been conducted to identify representative TE material descriptors and AI models to predict TE properties [14, 15, 16].

The purpose of this review article is to provide a comprehensive overview of the most effective strategies for predicting properties and optimizing processes for materials and structures using cutting-edge ML methods and machine learning-driven optimization approaches in materials science with a focus on thermoelectric materials to make a presentation. Our paper begins with the thermoelectric material data-generating approach and then moves on to the use of machine learning models and a review of the literature. We believe that our paper’s breadth is meant to be useful to both academia and industry as a reference for reaching thermoelectric materials research and advances. Figure 1 shows the logistic structure of our article. In the first half of our article, we will discuss how a thermoelectric database can be generated from theoretical calculations or practical observations to describe new materials using machine learning approaches. In the second part of our article, machine learning model evaluation methods in TE materials and different machine learning techniques will be explained respectively, and which training model from machine learning methods will help to discover new materials with higher accuracy will be presented with a comparative discussion. The studies that have been done in the literature so far will be listed in detail in the last part of our paper.

Figure 1.

The proposed workflow of this chapter. The first step is learning from theoretical calculations or experimental results thus thermoelectric database can be created. The second step is machine learning model training and choosing the best suitable algorithms to make highly accurate predictions. The third step presents the discovery of new thermoelectric materials.

Advertisement

2. Thermoelectric database generation

Machine learning-based approaches use materials knowledge from experimental data or high-throughput theoretical calculations to aid in new thermoelectric materials discovery, design and optimization. We can give these database sources and their explanations as follows.

2.1 From experimental characterizations

To learn models, machine learning needs a database of previously acquired knowledge. The materials database has to be marked with the appropriate TE attributes for machine learning-guided TE materials discovery to work. The TE efficiency dominates three combinatorial material properties: Seebeck coefficient, electrical conductivity and thermal conductivity. Because of the influence of these three combinatorial factors on each other, maximizing these three parameters is the most difficult challenge in TE materials. Thus, these three combinatorial parameters are the most popular tags for the TE machine-learning computer model. Phonon and electron distribution are considered as the labels of choice from the controls, band structure, band gap and phonon distribution of TE materials. The creation of a machine learning-based material discovery tool requires access to vast volumes of data where the learning process can provide accurate correlations between input and output pairs. The most typical application of these TE labels for various materials is characterization. Tools for generating fundamental TE materials include machine learning models for the results of TE-related processes from diverse materials research fields. The UCSB database is one of the most extensive TE materials databases [17, 18]. The UCSB database includes various component TE properties for more than 1000 different users, abstracting information from more than 100 publications. As shown in Figure 2, the UCSB database combined with appropriate visualization tools can provide users with an efficient approach to developing new TE materials from experimental characterizations.

Figure 2.

A snapshot of the UCSB website and a look at its contents.

2.2 From theoretical calculations

The first-principles computation of atomic-scale materials is another method for obtaining TE data for materials, in addition to TE data through experimental characterization. When compared to the experimental characterization database, the theoretical calculation results have a standardization value, which implies that the TE efficiency results will not be influenced by equipment, human or measurement error. The Seebeck coefficient, carrier thermal conductivity, phonon thermal conductivity and electrical conductivity can be computed using the Boltzmann transport equation.

The first-principle calculations equation is capable of producing trustworthy and accurate TE data, but it is computationally costly. As a result, it is challenging to fulfill the need for a huge amount of data for machine learning using first-principle computation. High-throughput first-principle computing was developed to tackle this dilemma [19]. The computational cost can be greatly decreased in high-throughput first-principle research with some accuracy loss [20, 21, 22]. High-throughput first-principle calculations can save results to massive material databases for later use, such as quick material scanning. As shown in Figure 3, the JARVIS-DFT database contains TE performance data of approximately 36,000 three-dimensional and 900 two-dimensional materials from density functional theory calculations (DFT). Along with the electronic thermal conductivity, electrical conductivity and Seebeck coefficient, the JARVIS-DFT cage also improves thermal conductivity. To create a machine learning classification model for prescreening materials with good TE characteristics, this data is also utilized. Additionally, Table 1 shows a selection of a publicly available list of datasets of thermoelectric characteristics that may be utilized for machine learning.

Figure 3.

A snapshot of the JARVIS-DFT website and summary of its contents.

DatasetYearReferencesData sourceCompoundsFeatures
Wang et al.2011[23]Theory2585PF, m*
Carrete et al.2014[24]Theory450κph
TE Design Lab.2016[25, 26]Theory2701κph, μ, mD
Ricci et al.2017[27, 28]Theory47,737σ, S, κC
Xi et al.2018[8]Theory161PF
Chen et al.2019[29]Experiment100κph
Starrydata22019[30, 31]Experiment434σ, S, κtotal
Priya et al.2021[32, 33]Experiment585λi
Jaafreh et al.2021[34]Theory119κph
Miyazaki et al.2021[35]Theory143κph
MIP–3d2021[36, 37]Theory4400σ, S
Tranås et al.2022[38]Theory122κph

Table 1.

A list of publicly available datasets of thermoelectric properties that can be used for machine learning is presented.

Here, the physical properties are indexed as follows: PF is the power factor, m* is the carrier effective mass, κph is the phonon thermal conductivity, μ is the carrier mobility, mD is the density of states effective mass, S is the Seebeck coefficient, κc is the carrier thermal conductivity, κtotal is the total thermal conductivity, and λi is the ionic conductivity.

Advertisement

3. Machine learning (ML)

ML terminology and all related definitions will be explained in this section and will help the readers better understand and become familiar with the various machine learning categories. As depicted in Figure 4, ML models are classified into three types: supervised, unsupervised, semi-supervised, and reinforcement learning (RL). Input and output variables, sometimes referred to as independent and dependent variables, are included in the training dataset for supervised learning. We may envision a dataset in the field of materials science that includes both chemical and physical attributes. Structures are the independent variables, whereas material attributes are the dependent variables. The machine learning algorithm is programmed to learn the function that represents the connection between independent and dependent variables.

Figure 4.

Schematic representation of machine learning algorithms. The hierarchy of several machine learning algorithms, including supervised, unsupervised and reinforcement learning methods.

3.1 Supervised learning

The use of labeled datasets to train algorithms for reliable data classification or result prediction characterizes supervised learning. The structure of process steps in supervised learning is shown in Figure 5.

Figure 5.

Supervised learning workflow.

3.1.1 Regression

Most of the work on the use of machine learning models in the TE material field has been done with regression. Regression models are models that consistently give consistent values when input is given. The link between a single or several quantitative or categorical independent variables and categorical quantitative dependent variables is represented by regression models. To determine the Seebeck coefficient, electrical conductivity, thermal conductivity, etc., in the study of TE materials, researchers need well-organized and well-chosen material characteristics. It uses sophisticated regression frameworks to forecast desirable material characteristics such regression models may be roughly categorized into two groups: deep learning models, which primarily rely on neural network theories, and classical statistical learning models, which are typically based on classical statistical learning theories. Support vector regression, tree-based models, Gaussian processes, and linear regression are the statistical learning techniques that are most frequently utilized in research that already exist to predict the characteristics of TE materials.

3.1.2 Linear regression/multiple regression (LR/MR)

The linear modeling of a scalar response’s connection with one or more explanatory variables, also referred to as the dependent and independent variables, is known as linear/multiple regression [39]. Simple linear regression refers to a scenario where there is only one variable, while multiple linear regression refers to a situation when there are numerous variables. This varies from the phrase “multivariate linear regression,” which predicts numerous linked dependent variables rather than a single scalar variable. The assumption of a linear regression model is that the regression function will be linear in terms of the input variables X1,…,X2, which may be descriptors of the input material. The fundamental linear regression model corresponds to this [39],

fx=β0+j=1pXjβj.E2

In this case, β0 stands for the learned bias term and βj for the learned weight corresponding to Xj. Between the input independent variables and the output dependent variable, linear models presuppose a linear connection or a good approximation of a linear relationship. In this paradigm, the input variables might be the original quantitative or categorical values, such as material descriptors or transformations of the original values, like log, square root, polynomials or other transformations. The parameters in Eq. (2) are estimated with ordinary least squares (OLS) minimizing the squared error shown as [39],

minβi=1Nyifxi2E3

where N is the number of samples in the training set, xi, i. for example, is the feature vector, and yi, i. is the sample’s actual value. Minimizing Eq. (2) to estimate parameters is equivalent to solving a normal equation as [39]

β̂=XTX1XTyE4

where X is the feature matrix and Y is the actual target vector. While the basic form of linear regression is simple and useful in many scenarios, it can lead to overspread if the learning for some variables is too large to be the consumer, but if the training observes the observations but not the unseen thought. Contraction strategies can be employed to alleviate the model’s significant variability. The most commonly used regularizations are ridge editing and Lasso editing. Both ridge and Lasso regression limit the size of the parameters for oversteer prevention entering the OLS, the penalty terms controlled by the amount λ [39].

In ridge regression, the model encourages small parameter size using the L2 norm, while in Lasso regression, the model encourages a value of 0 using the L1 norm. When the feature matrix is invertible, it is possible to estimate the parameters in each of the three models by solving linear systems. A different approach is to use gradient-based optimization methods like stochastic gradient descent to minimize fx regardless of whether the feature matrix is reversible as [39]

minβi=1Nyifxi2+λi=1Nβ2jE5
minβi=1Nyifxi2+λi=1Nβj.E6

Material descriptors and their transformations can be handled as input independent variables and desired material qualities as output dependent variables when linear regression models are applied to TE materials. In order to evaluate the power factors of sintered powders, Wang et al. used a linear regression analysis [23]. They found that the power factor was strongly correlated with the electronic band gap and carrier effective mass. Utilizing PCA-transformed features, Reokeghem et al. [40] used the linear regression model to calculate the force constants of semiconductor oxides and fluorides with cubic perovskite structures at various temperatures. To calculate the elastic bulk and shear modulus of polycrystalline materials, De Jong et al. built a polynomial feature basis using composition and structural descriptors and used Lasso regression with gradient boosting [41]. The trained model was also utilized to scan very hard materials. Miller et al. looked at the use of the linear regression model, along with other techniques, to forecast the carrier concentration range of semiconductors that resemble diamonds [42]. Given the composition of ionic radii, Li et al. utilized Kernel Ridge Regression to predict the dissociation energy and verified the training model using the formability of actual perovskites [43]. The regression model’s success demonstrated that the experimental engineering of stable perovskites might be guided by machine learning techniques when applied to DFT-computed data. Iwasaki et al. [44] calculated thermal power using quadratic polynomial lasso regression and elastic mesh. When introducing machine learning methods to a particular regression problem for the first time, researchers should take into account linear regression models since they are straightforward and simple to comprehend. However, only a few cases allow for the validity of the assumption that there is a linear relationship between the input features and the output target. For TE materials with complex nonlinear interactions between input descriptors and material properties, models that can capture nonlinearity should be taken into consideration.

3.1.3 Classification

3.1.3.1 Support vector machine (SVM)

Support vector networks and support vector machines (SVMs) are supervised learning models for data prediction. As a non-distributed binary output classifier, SVM training technology develops a model that redistributes one of the two categories in accordance with prior institutions [45]. In order to increase the separation between the two groups, DVM carefully maps training samples to points. New birds are calculated and mapped to the same region based on which home the inhabitants are from. By incorporating them into high-dimensional feature areas that they enter, DVMs employ the use method to build non-persistent environments. The same theory as SVM underlies Support Vector Regression (SVR), a regression method. When data is in an unlabelled format, supervised learning rights are available. Therefore, it incorporates an unsupervised learning method where fresh data is mapped and data is organically sorted into categories. Support vector clustering algorithms classify unlabelled data by using support vector statistics of animal motions from SVM. It is one of the clustering jobs for industrial computers that is most frequently employed [45].

3.1.3.2 Decision trees (DT)

Decision trees, also known as tree-based models, are not parametric and as a result, this method is employed in supervised learning for both classification and regression. A decision-making tool called DT makes use of a tree-based representation of options and possible consequences [46]. By nesting the data on certain property values depending on specified parameters, they may capture nonlinear correlations between the predictive and target variables. The goal of each split is to generate more homogenous datasets in which the target values are more comparable to each other than they were before the split. Decision trees perform a search that covers the entire dataset, find the feature and the split value, then observe each distinct value of each feature, and perform the task of dividing the data into two subsets. As a result, total errors are minimized. Thus, the conditions used for data splitting are determined by the potential homogeneity of the target values. Decisions are made in the leaves of the algorithm as the data is divided into nodes. In classification trees, the decision variable is categorical [47]. Due to this mechanism, tree models can be used to determine the importance of a feature, as the most effective feature will be the one to split the data first. One of the biggest problems decision trees have is overfitting. A single tree tends to overfit the training set (it is sensitive to changes in the training sets), leading to over-learning on previously unknown data. To address this problem, ensemble methods such as random forests [48] and gradient-enhancing trees [49] can be used. There are several advantages to using linear and non-parametric tree-based regression models. First, tree-based models provide great interpretability since decisions are made in a certain sequence based on features and their values. Second, tree-based models can handle both categorical and continuous input characteristics intuitively, with no data preprocessing required. Lastly, while automatically reflecting the significance of input characteristics, tree-based models can capture the complex nonlinear relationships that exist between input-output pairings. In order to estimate TE material qualities from material descriptions, tree-based models are frequently utilized.

Carret et al. used random forest regression to predict the lattice thermal conductivity for semi-Heusler compounds based on chemical, compound, and particular thermal conductivity information in some of the decision tree investigations in materials discovery [24]. The learned regression model was used to examine the thermodynamic stability as well. According to Gautois et al., they trained a random forest model to estimate the Seebeck coefficient, thermal conductivity, electrical resistance, and band gap using data from the periodic table [50]. Additionally, the developed model successfully suggested a unique molecule from the real chemical space that could be tested experimentally, demonstrating the potential of employing machine learning techniques to guide materials discovery and design.

Furmanchuk et al. [51] utilized the random forest to quickly anticipate the properties of materials that were experimentally synthesized and to determine the Seebeck coefficient of crystalline materials. Miller et al. employed the random forest in addition to the linear model to calculate the repeatability and range of carrier concentrations for semiconductors that resemble diamonds [52]. For this objective, the random forest did not, however, perform better than the linear model. In order to estimate the interface thermal resistance between two materials using well-selected physical, chemical and material attribute descriptors, Wu et al. employed LSBoost’s regression tree assemblages [52]. With all descriptors, the ensemble model’s coefficient of determination (R2) was 0.919, while with just feature descriptors and thickness, it was 0.907. Iwasaki et al. employed a decision tree regression model in addition to an elastic mesh and quadratic polynomial Lasso regression inside a linear model framework to estimate thermopower [53]. The model assisted in investigating the underlying physics of the spin-driven TE phenomenon and in developing materials that exhibited these effects. The ability of the random forest regression model to forecast the success rate (ZT) of hot extruded CuxBi2Te2.85 + ySe0.15 TE materials was studied by Wang et al. [53].

3.1.4 Deep learning

Deep learning (DL) approaches are based on neural network theories, which vary from traditional machine learning techniques in that processes can be represented by linked neurons [54]. Artificial neural networks (ANNs) are suitable estimators for every function, according to the universal approximation theorem [55]. Multiple connected layers of neurons might solve the intractability issue while retaining performance instead of having a high number of neurons in a single layer to capture complicated mapping within the data. Deep learning techniques may learn many layers of representations of the original input data, which are created by nonlinear modules modifying the representation one level at a time [56]. Deep learning techniques can also capture complex mappings. The capacity for representation learning permits the use of the most unprocessed material and does not need extensive feature engineering or selection. The feed-forward fully connected neural networks (FCNN) are mostly employed in the application of TE materials. Despite having advantages over traditional machine learning models, deep learning is sometimes over-parameterized and hence needs a lot of data to be able to acquire a good mapping that generalizes effectively. Due to data scarcity and data sparsity, this imposes a significant limitation when using deep learning algorithms to predict material attributes. In addition, the lack of a pre-defined model shape and the complicated hierarchy of layers and neuron activation make deep learning models difficult to explain.

3.1.4.1 Neural networks (NN)

The design and operation of biological neural networks served as the inspiration for the machine learning technique known as neural networks. They are made up of layers of linked nodes that process incoming data and generate output. To recognize pictures, comprehend spoken language and forecast time series, among many other tasks, neural networks are utilized [57, 58, 59]. As learning advances, the weights of connections, which are referred to as edges, alter. By using different layers, different adjustments to inputs are carried out. A known input, a known output, and probability-weighted associations that are recorded in the network’s data structure are used to train ANNs. To train a neural network, one uses the error, which is the discrepancy between the processed output of the network and the desired output. According to a learning rule and error value, the network’s weighted associations are updated. When a certain number of modifications yield outcomes that are somewhat near to the expected outcomes, training may be considered complete. A wider family of machine learning algorithms built on representational learning and neural networks includes DL. When extracting higher-level characteristics from raw data using several layers, DL is a form of machine learning methodology that is more accurate than other machine learning techniques. Examples of DL methods include deep neural networks (DNNs), convolutional neural networks (CNN), recurrent neural networks (RNN), ANN, and convolutional neural networks (RNN). Hidden layers allow data to go from the input layer to the output layer. Despite the fact that DL approaches are more accurate and effective than other machine learning techniques, they still need a lot of data and are computationally costly because of the numerous parameters that must be optimized during training.

3.2 Unsupervised learning

Unlabelled datasets are analyzed and clustered using machine learning techniques in unsupervised learning. Without requiring human participation, these algorithms identify occult patterns or data clusters. In Figure 6, the structure of process steps in unsupervised learning is shown.

Figure 6.

Unsupervised learning workflow.

3.2.1 Principal component analysis (PCA)

A typical linear dimension reduction approach used to extract significant information from various datasets, converting the input features to a new coordinate to minimize the number of features while keeping the majority of the original information [60]. It provides a roadmap on how we can reduce complex datasets to a smaller size to reveal a simplified structure. Because of its simplicity, it is a fundamental analysis used in data analysis and other domains. To produce the principle components whose data captures as much variation as feasible, linear combinations of the original attributes are used.

A certain way is followed to establish the basis. The first fundamental explanation is the linear approximation of the original features with the greatest variability among all possible combinations. Linear operations of the original features that contain the most variability among the remaining components are considered second. The other component continues by seeing this rule. Therefore, the basic configuration of a dataset can provide the best-going approaches. PCA execution procedures are listed below.

  • Data are standardized.

  • Construct a covariance matrix for self-division.

  • Extract the eigenvalues of the eigenvectors of the covariance matrix to determine the basis determinations.

  • It is tried not to be accepted as the optimum main.

In TE material machine learning studies, PCA has been applied in most applications to reduce the input dimensions during model creation [40]. If we give an example of applications where PCA is used. In their study of estimating force constants, Roekeghem et al. used PCA to transform the original descriptors and selected the top 10 principal components as regression model input [40]. Wagner and Rondinelli [61] employed PCA to convert strongly correlated mode characteristics and the first three principal components in conjunction with decision trees to forecast high-temperature perovskites.

3.3 Reinforcement learning

Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones, as displayed in Figure 7. In general, a reinforcement learning agent can perceive and interpret its environment, take actions and learn through trial and error.

Figure 7.

Reinforcement learning workflow.

3.3.1 Gradient boosting (GB)

A class of supervised machine learning techniques known as gradient boosting employs a group of weak learners to produce a strong learner. The approach works by gradually introducing weak learners into the group, each of whom corrects the mistakes of the one before them. Decision trees are frequently used as weak learners, and a gradient descent technique is used to train the ensemble. As a result, the model is more precise than any of the individual subpar students. XGBoost, LightGBM, and CatBoost are a few well-known gradient-boosting algorithms [62]. Compared to other machine learning methods like random forests and support vector machines, gradient boosting approaches often offer superior accuracy. This is so that the algorithm may gradually increase its accuracy by learning from the errors of earlier, less accurate learners.

3.3.2 Feature learning

It is crucial to use appropriate descriptors that provide sufficient details for the associated attributes if one wants to forecast material properties effectively and precisely. It is unclear how to pick the appropriate descriptors from the vast array available to characterize the nature of materials. Additionally, in order to prevent overfitting and ensure that they generalize adequately to new, unknown data, machine learning algorithms often require a substantially higher number of samples than the number of characteristics or identifiers. To address this issue, efforts have been made to choose descriptors that are instructive or to transform the descriptors into a feature set with a reduced dimension while retaining the original data. These two strategies are called feature engineering and feature selection, respectively [63]. Without domain knowledge, descriptors can be derived or mixed based on competence in data-driven feature selection and engineering. Instead, supervised or unsupervised methods can be used to do feature selection and engineering. Currently, Pearson correlation, principal component analysis, and automated encoders are the most widely used techniques for data-driven feature selection and engineering for TE materials.

The foundation for the effective use of machine learning algorithms in the design and discovery of TE materials is the search, identification and selection of relevant, dominating material descriptors or features with enough numerical weight to enable precise model predictions. It is common to refer to feature fingerprints or identifiers as the collection of arguments that must be provided into a certain model. A strong awareness of hidden relationships between input and desired output as well as domain expertise, are frequently required for feature selection. Problems emerge when important material identifiers are missing from the original dataset or when feature engineering of particular inputs is not possible in order to numerically express these descriptors. The choice of features should be made so that the influence of each individual input variable on the final dependent target output is significant but not always clear. In other words, one’s intuition, skill or subject knowledge plays a major role in successfully finding and choosing relevant qualities. The use of intuition in feature engineering is optional. First, it may result in fresh perspectives or, at its finest, the identification of the physics’ fundamental rules. In the latter, it may result in the inclusion of elements that are irrelevant, which is generally discouraged because they do not significantly affect the prediction made by the model as a whole. Thus, choosing properties must first be guided by physical principles that have been demonstrated to apply specifically to the structure-property correlations of the materials being considered. The design and discovery of data-driven TE materials continue despite significant advancements in the development of thermoelectric materials databases due to the lack of diverse datasets containing essential material descriptors, materials synthesis parameters and sufficiently large experimental data volumes.

3.3.3 Pearson correlation

The linear relationship between two random variables is denoted by the Pearson correlation or Pearson correlation coefficient. It is defined as follows [64]

ρX,Y=covXYσXσYE7

where cov represents the covariance of random variables X and Y and σX, σ𝑌 represents the standard deviation of random variables. In the field of machine learning, the Pearson correlation is frequently used for feature selection. A positive value close to 1 indicates that two variables are closely linearly connected, and one can be left out of a machine learning model to save computing costs. Furthermore, the ability of the descriptors to predict the goal linearity may be demonstrated by obtaining the Pearson correlation coefficient between material descriptors and desired qualities. A significant Pearson correlation coefficient was found between the power factor and both the electronic bandgap and carrier effective mass in Ref. [64], which used linear regression to determine the material power factor.

3.3.4 Auto encoders

Autoencoders are frequently used to transform the initial feature vectors into a lower dimensional vector known as a hidden vector, serving the same goal as PCA as an unsupervised learning technique. In order to recover high-level representations of the original characteristics, autoencoders train neural networks with identically sized input and output layers. To reduce the inaccuracy of the network output to its input, an autoencoder is trained. The benefit of autoencoders is their capacity for nonlinear transformation of the original feature vectors.

Advertisement

4. Literature review: machine learning techniques on thermoelectric materials

ML-based technologies are becoming more and more crucial in the field of TE materials due to the abundance of data from high-throughput investigations. For example, researchers use machine learning to uncover new TE materials [65] and estimate TE parameters such as band gap [66, 67], thermal conductivity [68], and Seebeck coefficient [69]. In the field of TE materials, most of the works mainly focus on improving the accuracy of predictive models [65]. High-frequency nonlinear models have been shown to outperform linear models [70, 71, 72]. However, most nonlinear machine learning algorithms are often treated as black boxes as they are too complex and inexplicable to humans, hindering the widespread adoption of machine learning. In this section, we would like to present a comprehensive overview of the research in the thermoelectric field of machine learning, together with referencing the important works previously reported in the literature. In recent years, ML has spurred widespread application in the field of materials and chemical sciences, attributed to the rapid development of artificial intelligence technology, especially machine learning methods, its high efficiency and informativeness [21, 22, 73, 74]. Materials using ML have been reported for numerous studies on thermoelectricity.

4.1 Machine learning studies focus on electrical transport properties

In the literature adopting high-throughput first-principle calculations offers the largest computational database of handling properties of approximately 48,000 materials [27, 54, 75, 76, 77, 78, 79]. In these studies, the band structure of materials is computed using Boltzmann transport theory to determine TE-related parameters such as electronic conductivity, electronic thermal conductivity and Seebeck coefficient. Also, most of the data is saved on the Material Project website, as given in Figure 8, including its database entries.

Figure 8.

A snapshot of the Materials Project website and summary of its database contents.

This database also covers the transport properties of materials at various constant doping carrier concentrations, Fermi energies and temperatures. It has been proven by further research that these calculation results have a fair agreement with the experimentally measured maximum Seebeck coefficient. This reliable and abundant database is a valuable resource for machine learning-based TE material exploration techniques [80, 81]. TE features obtained from both experimental characterizations and/or theoretical calculations are crucial data sources for the machine learning process to discover new efficient TE materials. The developed machine learning model is a powerful tool for stoichiometry and nanostructure optimization for TE materials.

In the literature, machine learning techniques are used in the calculation and/or estimation of thermoelectric power factor that includes two crucial electrical transport properties: the Seebeck coefficient and electrical conductivity. Some popular ways for improving PF are band engineering [9, 82], modulation doping [6, 66, 67] and altering the effective mass of the energy band [83]. Doping is a well-known method for improving material TE characteristics, and following this method could lead to the discovery of new and efficient TE materials [84, 85, 86, 87]. We can list a few important studies from the literature as follows; Wang et al. adopted a machine-learning technique to optimize the Cu content in Cu-doped Bi2Te2.85Se0.15 [88]. The experimentally measured ZT with variable Cu content was used as a label for an ANN. The resulting model with a correlation coefficient of 0.99 shows excellent accuracy. Also, Hou et al. used the machine learning-based framework to discover the suitable Al/Si ratio in Al2Fe3Si3 for TE applications [89]. To develop the model, the experimentally determined power factor for the unsynthesized materials predicted by the machine learning model was used, and the optimum material ratio increases the power factor by approximately 40%.

The ideal internal stress for TE materials was also determined using machine learning techniques. The link between XRD (X-ray diffraction) and the Seebeck coefficient of materials was discovered by Saaki et al. using machine learning [90]. Ideal stresses of 3–4% and 1–2% along the a and c axes, respectively, are predicted by the trained model to significantly increase the Seebeck coefficient.

The thermal conductivity of a material based on chemical elements is an important property for the Seebeck coefficient estimation at all temperatures, according to Furmanchuk et al.’s proposed ML solution, which can predict the Seebeck coefficients of crystalline materials in the temperature range of 300–1000 K [69]. In addition to the importance of the attribute, certain ML models may explicitly offer equations for compound attributes and identifiers. The discovered approach may be used to discern between positive and negative correlations between all descriptors and targets by examining the coefficients of the formulae.

Power factor, band gap and charge carrier effective mass have been shown to positively correlate by Wang et al. using high-throughput ab initio calculations and regression analysis [23]. They discovered that atoms per unit cell with many different materials typically had a high power factor.

The Seebeck coefficient, electrical conductivity, thermal conductivity, and band gap are used to determine the TE potential of a material in a web-based recommendation engine developed by Oliynyk et al. [19]. With no structural input on more than 400,000 possible combinations of elements, our Heusler discovery engine surpasses competing methods by quickly and precisely predicting the occurrence of Heusler vs. non-Heusler compounds. The model has a 0.94 actual positive rate.

Due to its great precision and speed, applications of ML in thermoelectric materials are being researched more and more. By producing attributes from the chemical formula that was proven by experiment, Iwasaki et al. published the ML model, which sped up the discovery of new candidate materials [91]. Descriptors for training the ML model were automatically produced from the composition using a composition-based feature vector (CBFV) in yet another study for the spin-driven thermoelectric effect device [92]. The findings demonstrated the significance of certain parameters for thermopower, including atomic weight, spin and orbital angular momentum. Wang et al. also used ML to study the CuxBi2Te2.85 + ySe0.15 system [53]. Principal component analysis (PCA) and a regression technique were used to study the relationship between microstructure and thermoelectric qualities. It was also shown that ML can build experimental setups to obtain a high ZT value in addition to forecasting the features of novel materials.

An effective method for determining the Al2Fe3Si3 thermoelectric compound’s ideal chemical composition was described by Hou et al. [20]. The Bayesian Optimization (BO) algorithm allows for successful application of machine learning to the experiment. When compared to the sample with an initial Al/Si ratio of 0.9, the power factor may be increased by roughly 40%. The framework of this study, according to the authors, might also be used for Al2Fe3Si3 that has been exogenously doped.

The most typical method for enhancing ZT is to exogenously introduce certain elements to the BiCuSeO structure in order to lower thermal conductivity, raise carrier concentration and enhance electrical transport characteristics. With so many chemicals on the market, however, painstaking testing is required. As a result, using ML to direct the effective doping of BiCuSeO may be a smart approach to finding a solution [93, 94, 95].

Iwasaki et al. used supervised ML models to establish key physical parameters controlling the spin-driven thermoelectric effect and proposed a new material that shows promising results [44]. They established the fundamental physical parameters governing spin-driven thermoelectric (STE) materials using machine learning modeling. Their real material synthesis, which was guided by the models, resulted in the discovery of a novel STE material with a thermopower order of magnitude greater than that of the current generation of STE devices.

In 2016, Fan et al. proposed a mathematical model to calculate the optimal length and cross-sectional area of the thermoelectric generator (TEG) to maximize the power output. They found that the maximum power was obtained from the TEG at the optimum length-to-sectional area ratio under convective thermal boundary conditions [96]. In another TEG study done by Wu et al., they used a local optimization method to maximize the efficiency of a segmented TEG by adjusting the thermoelement cross-sectional area and the thickness of the segment; thereby, the total yield from TEG was 23.72% [97]. In the work of Ferreira-Teixeira and Pereira’s thermocouples (TCs), made up of a p and an n-type leg, and thermoelectric devices with various geometries are numerically modeled using the COMSOL Multiphysics programme to find an optimized geometry. They reported that the optimal ratio between thermoelectric height and width should be 5 × 10−3 [98]. They also stated that the optimal height ratio between the Cu contacts and the thermoelectric foot height was 40. The impacts of structural factors and thermodynamic boundary conditions on the output performance were examined in the pattern of identical p-n segmentation ratios by Ma et al. [99]. They reported that longer thermoelectric elements and greater heat transfer coefficients increase the ideal percentage of medium-temperature material (CoSb3), whereas the cross-section area has no effect. The second pattern then examined the power improvement capability in light of the differences in the properties of p-type and n-type materials. Comparing the maximum output power to the segmented model’s initial value, there has been an improvement of about 13.8%. Last but not least, the use of the best-segmented ratio design in a thermoelectric generator system showed improved performance and boosted output power by 6.8%. Kim et al. estimated the performance of a TEG running on a diesel engine using ANNs implemented using Python code [100]. Validation studies found a 3.49% difference between the output power of the experimental and predicted TEG. Wang et al. presented a fast and accurate DL model to predict the performance of TEGs [101]. The proposed deep learning model improved the power output of TEG by 182%. Kishore et al. provided ANN models that can predict TEG performance and found that two hidden layer ANNs with six neurons in each layer were most efficient in predicting the performance of TEG [102]. The optimum ANN model estimated the power and efficiency of the TEG with an accuracy of ±0.1 W and ±0.2%, respectively, in under 26.4 microseconds per data point compared to the 6 minutes required by traditional finite element simulations. Input parameters are leg length, leg cross-sectional area and external resistance. They also explained that increasing the number of neurons per layer above gold does not improve the prediction accuracy of the ANN.

In the study by Zhu et al., a DL technique is used to forward simulate the maximum power output and efficiency of a thermoelectric generator as well as its use in generator design and optimization [103] after being trained on a dataset made up of 5000 3-D finite element method-based simulations, artificial neural networks with five layers and 400 neurons per layer displayed extraordinarily high prediction accuracy of over 98%. Furthermore, they might function under situations of continuous heat flux and temperature difference while taking into consideration thermoelectric phenomena such as contact electrical resistance and surface heat transfer.

Ang et al. predicted a TEG’s energy output in its operational environment using an ANN model [104]. A multilayer perceptron (MLP) was trained in a supervised manner and evaluated on the dataset created using a verified finite volume approach to forecast the energy generated. Their model could also conduct reverse ANN to predict the input value when given an output value, in addition to forecasting the output values.

4.2 Machine learning studies focus on thermal transport properties

To make TE energy an economically feasible alternative for waste heat recovery, TE materials with ZT > 1 are necessary. The construction of phonon glass-electron crystal structures that enable the separation of electron and phonon transport characteristics has been the main experimental emphasis in the investigation of oxide TE materials [105]. The development of thermoelectric oxides has mostly been focused on techniques that enhance the hierarchical scattering of phonons. The most popular method for starting hierarchical phonon scattering has been the use of sintering additives [106, 107, 108]. The literature study makes it abundantly evident that the class of materials found for TE applications has so far been relatively constrained and that our knowledge of the electronic and phonon transport of crystalline alloys is quite constrained [109]. On the other hand, quick advancements in materials informatics have aided researchers in finding novel, promising classes of materials and establishing links between design factors and thermoelectric characteristics [65]. Large lattice parameters, a wide band gap, and a high effective hole mass are essential characteristics for nanoparticle semi-Heusler compounds to have a high TE efficiency, according to high-throughput material modeling and ML approaches. For more experimental research, new semiconductors with extremely low κph values have been suggested [24]. In essence, there have been a lot of theoretical and empirical attempts over the past few decades to evaluate the κph of various systems. A mapping between the input properties (such as the atomic mass, phonon frequency, and unit cell volume) and the target property κph may be produced using ML using this data. In contrast to first-principles calculations and MD simulations, the data-driven ML models enable high-throughput evaluation of κph, which has outstanding predictive potential for systems inside and outside the training set.

Juneja et al. performed high-throughput ab initio calculations on 195 binary, ternary, and quaternary molecules in the dataset [110]. Calculations were made to determine the lattice thermal conductivity κph values, which range over three orders of magnitude, for 120 dynamically stable non-metallic compounds. 11 ultrahigh and 15 ultralow κph materials are found among them. According to an investigation of the created property map for this dataset, κph strongly depends on four basic descriptors: average atomic mass, maximum phonon frequency, integrated Grüneisen parameter up to 3 THz, and unit cell volume. An ML model based on Gaussian process regression was created using these descriptors. The model’s exceptionally low root mean square error of 0.21 predicted log-scaled κph.

Zhang and Ling suggested including a rough estimate of the target feature using low-quality models as a way to improve the accuracy of ML models applied to small datasets. By including the empirical abundance model values of κph as descriptors in the ML model, they achieved high accuracy in estimating κph [111]. The link between the degree of freedom (DoF) of the model and the accuracy of prediction was revealed by their investigation as a significant occurrence when the model is trained using limited accessible materials data. The emergence of the precision-DoF relationship, which resulted from the statistical bias-variance tradeoff, limits the accuracy of prediction in unknowable domains. They also suggested employing the crude estimating of property in the feature space as a way to increase accuracy without increasing DoF. The incorporation of crude estimate significantly increased the predicted accuracy of ML models in three case studies, illuminating the applicability of the suggested method for building precise ML models from sparse materials data.

Chen et al. developed an ML-based model and employed sophisticated general property engineering technology in conjunction with the Gaussian process regression technique to estimate the phonon contribution value of inorganic materials [112]. Using a benchmark data set of around 100 inorganic materials that have been experimentally characterized, they developed an ML-based model to quickly and correctly discover inorganic materials. Along with the Gaussian process regression approach, they applied sophisticated and ubiquitous feature engineering techniques.

Juneja et al. combined ML with high-throughput computation to create regression models to predict κph of inorganic compounds. They also used the maximum phonon frequency and the integrated Grüneisen parameter as descriptors to construct ML models to estimate κph [113, 114]. Both ML models used complex derived features as descriptors in their ML models to predict κph, which limited their use in the early stages of material selection and design. ML models based on characteristic material properties should be used effectively in the discovery of new materials and shorten the design cycle time.

Jaafreh et al. performed high-throughput ab initio calculations on 195 binary, ternary, and quaternary molecules in the dataset [34]. Calculations are made to determine the lattice thermal conductivity κph values, which range over three orders of magnitude, for 120 dynamically stable non-metallic compounds. 11 ultrahigh and 15 ultralow l materials are found among them. According to an investigation of the created property map for this dataset, κph strongly depends on four basic descriptors: average atomic mass, maximum phonon frequency, integrated Grüneisen parameter up to 3 THz, and unit cell volume. An ML model based on Gaussian process regression was created using these descriptors. The model’s exceptionally low root mean square error of 0.21 predicts log-scaled κph.

The optimization of random multilayer structures (RMLs) is vital for achieving ultralow thermal conductivity, which is critical for a wide range of applications including thermoelectric materials. Chakraborty et al. found some critical criteria for assessing disorder in RML layer thicknesses [115]. Classical molecular dynamics simulations of hypothetical Lennard-Jones RMLs supported our ability to associate these disorder characteristics with thermal conductivity. Furthermore, they demonstrated that these metrics may be used as features in physics-based machine-learning models to predict the lattice thermal conductivity of RMLs with greater accuracy and efficiency.

Half-Heusler compounds were utilized as prototype examples by Liu et al. to show how a compressed-sensing approach may be applied to quickly and accurately assess lattice thermal conductivity, as realized by a physically interpretable descriptor [116]. Seventy-five half- and 15 full-Heusler compounds’ thermal conductivities were predicted using the descriptor, and the results show good agreement with explicit first-principles findings. The descriptor was further improved by supplying only the fundamental characteristics of the constituent atoms, which helped hasten the search for materials with the proper thermal conductivity.

The heat conductivity of two-dimensional materials such as graphene may be easily controlled by inserting holes, the density and distribution of which are crucial characteristics. To investigate the link between hole distribution and thermal conductivity decrease in monolayer graphene, Wan et al. used an inverse design process based on machine learning [117]. According to their method, the best distribution for reducing thermal conductivity in porous graphene is one in which holes are randomly distributed transverse to the direction of heat flow yet exhibit some periodicity along the direction of heat flow.

Carbon honeycombs (CHCs) and boron nitride honeycombs (BNHCs) have been revealed to have identical molecular architectures but distinct thermal characteristics. Thus, hybrid carbon-boron nitride honeycombs (C-BNHCs) with adjustable thermal conductivity may be created by correctly patching together CHCs and BNHCs. Du et al. used the ML approach in conjunction with molecular dynamics simulations to examine the thermal transport property of C-BNHCs, as well as to design C-BNHC structures for specified thermal conductivity [118]. In the inverse design of C-BNHCs with any given thermal conductivity, their ML-based technique demonstrated remarkable accuracy and efficiency.

Zhu et al. estimated the thermal conductivity of all known inorganic materials in the Inorganic Crystal Structure Database using a combination of graph neural networks and random forest techniques, then charted the structural chemistry into extended van-Arkel triangles [119]. Using the newly constructed map and their theoretical tool, we identify rare-earth chalcogenides as promising possibilities, with ZT values more than 1.0.

Ju et al. demonstrated that when lower-order feature qualities present in big data are appropriately selected and applied to transfer learning, large data may supplement small data for accurate predictions [120]. A neural network was used to directly connect the crystal information and thermal conductivity by transferring descriptors obtained from a pre-trained model for the feature property. Successful transfer learning demonstrated extrapolative prediction abilities and revealed descriptors for lattice anharmonicity. The resultant model was used to screen over 60,000 chemicals for unique crystals that might be used as diamond substitutes.

In order to detect unexpected lattice thermal conductivity κph enhancement in aperiodic superlattices versus periodic superlattices, Chowdhury and Ruan demonstrated a general-purpose adaptive ML-accelerated search process [121]. This process has implications for the thermal management of multilayer-based electronic devices. They employed molecular dynamics simulations to calculate κph with great precision, as well as a convolutional neural network (CNN) to forecast κph for a large number of structures. They repeatedly discovered aperiodic superlattices (SLs) with structural properties leading to locally improved heat transport and used them as extra training data for the CNN to enable accurate prediction for the target unknown SLs. Because of the existence of closely spaced surfaces, the detected structures displayed higher coherent phonon transport.

Advertisement

5. Summary and future perspectives

Thermoelectric materials are particularly beneficial in a variety of applications due to their non-toxic, low-cost, earth-plentiful, low-density and ecologically acceptable properties. To address today’s energy issues, research on efficient thermoelectric materials is becoming more and more important. The development of highly efficient thermoelectric materials has advanced significantly over the past few decades in both theoretical and practical investigations. Recent developments in nanotechnology, in particular, have introduced approaches that hold promise for improving the thermoelectric efficiency of basic systems. Although tremendous progress has been achieved in the literature, the scientific community is still concentrating its efforts on the discovery of a new generation of thermoelectric materials with the highest efficiency and applicability to everyday life. Therefore, the most important current goal in the thermoelectric research area is to find new and innovative thermoelectric material systems.

The purpose of this chapter was to discuss current advances in machine learning-assisted thermoelectric material discovery. Training from the correlation between thermoelectricity and material transport properties, machine learning might provide an advantageous thermoelectric material discovery tool for new chemical composition, nano-structural design, stoichiometry optimization and other applications. This newly acquired data might be used to extend the thermoelectric database and increase the training performance of the machine learning model. For further investigation, the active learning technique is advised. The given working framework may be relied on when using AI-guided data-driven methodologies for thermoelectric material discovery. Since the majority of pertinent studies focused only on using machine learning technologies to discover and create materials with excellent thermoelectricity, it is suggested that additional benefits, such as non-toxicity and earth-abundance, be taken into account as the additional output of the thermoelectric discovery tool.

Advertisement

Acknowledgments

Ö.C. Yelgel acknowledges the support from the University of Manchester, the National Graphene Institute and the School of Physics and Astronomy.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Rowe DM. Thermoelectrics Handbook. Boca Raton: CRC Press; 2005
  2. 2. Stabler FR. Automotive applications for high efficiency thermoelectrics. In: High Efficieny Workshop. San Diego, CA. 2002. p. 24
  3. 3. Zhao LD, Wu HJ, Hao SQ, Wu CI, Zhou XY, Biswas K, et al. All-scale hierarchical thermoelectrics: MgTe in PbTe facilitates valence band convergence and suppresses bipolar thermal transport for high performance. Energy Environ Science. 2013;6:3346
  4. 4. Yelgel ÖC. Theoretical study of thermoelectric properties of p-type Mg2Si1−xSnx solid solutions doped with Ga. Journal of Alloys and Compounds. 2017;691:151
  5. 5. Xie W, Weidenkaff A, Tang X, Zhang Q, Poon J, Tritt TM. Recent advances in nanostructured thermoelectric half-Heusler compounds. Nanomaterials. 2012;2(4):379
  6. 6. Liu W, Tan X, Yin K, Liu H, Tang X, Shi J, et al. Convergence of conduction bands as a means of enhancing thermoelectric performance of n-type Mg2Si1-xSnx solid solutions. Physical Review Letters. 2012;108:166601
  7. 7. Gan Y, Wang G, Zhou J, Sun Z. Prediction of thermoelectrics performance for layered IV-V-VI semiconductors by high-throughput ab-inito calculations and machine learning. NPJ Computational Materials. 2021;7:176
  8. 8. Xi L, Pan S, Li X, Xu Y, Ni J, Sun X, et al. Discovery of high-performance thermoelectric chalcogenides through reliable high-throughput material screening. Journal of the American Chemical Society. 2018;140:10785-10793
  9. 9. Graziosi P, Kumarasinghe C, Neophytou N. Impact of the scattering physics on the power factor of complex thermoelectric materials. Journal of Applied Physics. 2019;126:155701
  10. 10. Lu N, Han G, Feng Y, Sun Y, Lin G. Artificial intelligence assisted thermoelectric materials and discovery. Research Square. 2022;1:1-13
  11. 11. Balachandran PV, Xue D, Theiler J, Hogden J, Lookman T. Adaptive strategies for materials design using uncertainties. Scientific Reports. 2016;6:19660
  12. 12. Bassman L, Rajak P, Kalia RK, Nakano A, Sha F, Sun J, et al. Active learning for accelerated design of layered materials. NPJ Computational Materials. 2018;4:74
  13. 13. Sheng Y, Wu Y, Yang J, Lu W, Villars P, Zhang W. Active learning for the power factor prediction in diamond-like thermoelectric materials. NPJ Computational Materials. 2020;6:171
  14. 14. De Witte J. Data-efficient discovery of thermoelectric materials using deep learning, Ghent University Faculty of Engineering and Architecture Master’s Thesis. 2020
  15. 15. Han G, Sun Y, Feng Y, Lin G, Lu N. Artificial intelligence guided thermoelectric materials design and discovery. Advance Electronic Materials. 2023;9:2300042
  16. 16. Han G, Sun Y, Feng Y, Lin G, Lu N. Machine learning regression guided thermoelectric materials discovery. ES Materials and Manufacturing. 2021;14:20-35
  17. 17. Xu Y, Xiangmeng W, Li X, Xi L, Ni J, Zhu W, et al. New materials band gap prediction based on the high-throughput calculation and the machine learning. Scientia Sinica Technologica. 2019;49:44-54
  18. 18. Wang X, Xu Y, Yang J, Ni J, Zhang W, Zhu W. ThermoEPred-EL: Robust bandgap predictions of chalcogenides with diamond-like structure via feature cross-based stacked ensemble learning. Computational Materials Science. 2019;169:109117
  19. 19. Oliynyk AO, Antono E, Sparks TD, Ghadbeigi L, Gaultois MW, Meredig B, et al. High-throughput machine learning-driven synthesis of full-heusler compounds. Chemistry of Materials. 2016;28(20):7324-7331
  20. 20. Hou Z, Takagiwa Y, Shinohara Y, Xu Y, Tsuda K. Machine-learning-assisted development and theoretical consideration for the Al2Fe3Si3 thermoelectric material. ACS Applied Materials & Interfaces. 2019;11(12):11545-11554
  21. 21. Le T, Epa VC, Burden FR, Winkler DA. Quantitative structure-property relationship modelling of diverse materials properties. Chemical Reviews. 2012;112(5):2889-2919
  22. 22. Pilania G, Wang C, Jiang X, Rajasekaran S, Ramprasad R. Accelerating materials property predictions using machine learning. Scientific Reports. 2013;3(1):1-6
  23. 23. Wang S, Wang Z, Setyawan W, Mingo N, Curtarolo S. Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations. Physical Review X. 2011;1(2):021012
  24. 24. Carrete J, Li W, Mingo N, Wang S, Curtarolo S. Finding unprecedentedly low- thermal-conductivity half-heusler semiconductors via high-throughput materials modelling. Physical Review X. 2014;4(1):011019
  25. 25. Gorai P, Gao D, Ortiz B, Miller S, Barnett SA, Mason T, et al. TE Design Lab: A virtual laboratory for thermoelectric material design. Computational Materials Science. 2016;112:368-376
  26. 26. TE Design Lab. Available from: https://tedesignlab.org [Accessed: March 13, 2022]
  27. 27. Ricci F, Chen W, Aydemir U, Snyder GJ, Rignanese GM, Jain A, et al. An ab initio electronic transport database for inorganic materials. Scientific Data. 2017;4:170085
  28. 28. Data from: An Ab initio electronic transport database for inorganic materials. Available from: https://datadryad.org/stash/dataset/doi:10.5061/dryad.gn001 [Accessed: April 02, 2023]
  29. 29. Chen L, Tran H, Batra R, Kim C, Ramprasad R. Machine learning models for the lattice thermal conductivity prediction of inorganic materials. Computerised Material Science. 2019;170:109155
  30. 30. Katsura Y, Kumagai M, Kodani T, Kaneshige M, Ando Y, Gunji S, et al. Data-driven analysis of electron relaxation times in PbTe type thermoelectric materials. Science and Technology of Advanced Materials. 2019;20:511-520
  31. 31. Starrydata Dataset. Available from: https://github.com/starrydata/starrydata_datasets [Accessed: March 13, 2022]
  32. 32. Priya P, Aluru N. Accelerated design and discovery of perovskites with high conductivity for energy applications through machine learning. NPJ Computational Materials. 2021;7:90
  33. 33. Data from: Accelerated design and discovery of perovskites with high conductivity for energy applications through machine learning. Available from: https://figshare.com/s/10b18051e26fa4d4f18c [Accessed: April 02, 2023]
  34. 34. Jaafreh R, Kang YS, Hamad K. Lattice thermal conductivity: An accelerated discovery guided by machine learning. ACS Applied Materials & Interfaces. 2021;13:57204-57213
  35. 35. Miyazaki H, Tamura T, Mikami M, Watanabe K, Ide N, Ozkendir OM, et al. Machine learning based prediction of lattice thermal conductivity for half-Heusler compounds using atomic information. Scientific Reports. 2021;11:1-8
  36. 36. Yao M, Wang Y, Li X, Sheng Y, Huo H, Xi L, et al. Materials informatics platform with three dimensional structures, workflow and thermoelectric applications. Scientific Data. 2021;8:236
  37. 37. MatHub-3d. Available from: http://www.mathub3d.net/materials/matdb [Accessed: April 02, 2023]
  38. 38. Tranas R, Lvvik OM, Tomic O, Berland K. Lattice thermal conductivity of half- Heuslers with density functional theory and machine learning: Enhancing predictivity by active sampling with principal component analysis. Computational Materials Science. 2022;202:110938
  39. 39. Thursby JG, Schmidt P. Some properties of tests for specification error in a linear regression model. Journal of the American Statistical Association. 1977;72(359):635-641
  40. 40. Roekeghem A, Carrete JU, Oses C, Curtarolo S, Mingo N. High-throughput computation of thermal conductivity of high-temperature solid phases: The case of oxide and fluoride perovskites. Physical Review X. 2016;6:041061
  41. 41. De Jong M, Chen W, Notestine R, Persson K, Ceder G, Jain A, et al. A statistical learning framework for materials science: Application to elastic moduli of k-nary inorganic polycrystalline compounds. Scientific Reports. 2016;6:34256
  42. 42. Miller SA, Dylla M, Anand S, Gordiz K, Snyder GJ, Toberer ES. Empirical modeling of dopability in diamond-like semiconductors. NPJ Computational Materials. 2018;4:1-8
  43. 43. Li Z, Xu Q, Sun Q, Hou Z, Yin WJ. Stability engineering of halide perovskite via machine learning. Advanced Functional Materials. 2019;29:1807280
  44. 44. Iwasaki Y, Takeuchi I, Stanev V, Kusne AG, et al. Machine-learning guided discovery of a new thermoelectric material. Scientific Reports. 2019;9:1-7
  45. 45. Tian Y, Shi Y, Liu X. Recent advances on support vector machines research. Technological and Economic Development of Economy. 2012;18(1):5-33
  46. 46. Ray S. A quick review of machine learning algorithms. In: International Conference on Machine Learning. Faridabad, India: Big Data Cloud Parallel Computing Communication; 2019. pp. 35-39
  47. 47. Allers J, Harvey J, Garzon F, Alam T. Machine learning prediction of self diffusion in Lennard-Jones fluids. Journal of Chemical Physics. 2020;153:034102
  48. 48. Breiman L. Random forests. Machine Learning. 2001;45:5-32
  49. 49. Friedman JH. Greedy function approximation: A gradient boosting machine. The Annals of Statistics. 2001;29(5):1189-1232
  50. 50. Gaultois MW, Oliynyk AO, Mar A, Sparks TD, Mulholland GJ, Meredig B. Web-based machine learning models for real-time screening of thermoelectric materials properties. APL Materials. 2016;4:053213
  51. 51. Furmanchuk A, Saal J, Doak JW, Olson GB, Choudhary A, Agrawal A. Prediction of seebeck coefficient for compounds without restriction to fixed stoichiometry: A machine learning approach. Journal of Computational Chemistry. 2018;39:191-201. DOI: 10.1002/jcc.25067
  52. 52. Wu YJ, Fang L, Xu Y. Predicting interfacial thermal resistance by machine learning. NPJ Computational Materials. 2019;5:56
  53. 53. Wang ZL, Adachi Y, Chen ZC. Processing optimization and property predictions of hot-extruded Bi–Te–Se thermoelectric materials via machine learning. Advanced Theory and Simulations. 2020;3:1900197
  54. 54. Han G, Sun Y, Feng Y, Lin G, Lu N. Machine learning regression guided thermoelectric materials discovery – A review. ES Materials Manufacturing. 2021;14:20-35
  55. 55. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989;2:359-366
  56. 56. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444
  57. 57. Wang Y, Jiang Y, Lan J. FCNN: An efficient intrusion detection method based on raw network traffic. Security and Communication Networks. 2021;13:5533269
  58. 58. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735-1780
  59. 59. Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access. 2019;7:53040-53065. DOI: 10.1109/ACCESS.2019.2912200
  60. 60. Shlens J. A tutorial on principal component analysis. arXiv. 2014;1404:1100
  61. 61. Wagner N, Rondinelli JM. Theory-guided machine learning in materials. Frontiers in Materials. 2016;3:28
  62. 62. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics. 2013;7:21
  63. 63. Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science. 2013;68:314-319
  64. 64. Mbaye MT, Pradhan SK, Bahoura M. Data-driven thermoelectric modelling: Current challenges and prospects. Journal of Applied Physics. 2021;130:190902
  65. 65. Wang T, Zhang C, Snoussi H, Zhang G. Machine learning approaches for thermoelectric materials research. Advanced Functional Materials. 2020;30(5):1906041
  66. 66. Zhang J, Liu R, Cheng N, Zhang Y, Yang J, Uher C, et al. High-performance pseudocubic thermoelectric materials from non-cubic chalcopyrite compounds. Advanced Materials. 2014;26:3848-3853
  67. 67. Pei Y, Shi X, LaLonde A, Wang H, Chen L, Snyder GJ. Convergence of electronic bands for high performance bulk thermoelectrics. Nature. 2011;473:66-69
  68. 68. Wang X, Zeng S, Wang Z, Ni J. Identification of crystalline materials with ultra-low thermal conductivity based on machine learning study. The Journal of Physical Chemistry C. 2020;124:8488-8495
  69. 69. Furmanchuk A, Saal JE, Doak JW, Olson GB, Choudhary A, Agrawal A. Prediction of Seebeck coefficient for compounds without restriction to fixed stoichiometry: A machine learning approach. Journal of Computational Chemistry. 2018;39:191-202
  70. 70. Gladkikh V, Kim DY, Hajibabaei A, Jana A, Myung CW, Kim KS. Machine learning for predicting the band gaps of ABX3 perovskites from elemental properties. The Journal of Physical Chemistry C. 2020;124:8905-8918
  71. 71. Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C. Machine learning in materials informatics: Recent applications and prospects. npj Computational Materials. 2017;3:54
  72. 72. Schmidt J, Marques MRG, Botti S, Marques MAL. Recent advances and applications of machine learning in solid-state materials science. NPJ Computational Materials. 2019;5:83
  73. 73. Panapitiya G, Avendano FG, Ren P, Wen X, Li Y, Lewis JP. Machine learning prediction of CO adsorption in thiolated, Ag-alloyed Au nanoclusters. Journal of the American Chemical Society. 2018;140(50):17508-17514
  74. 74. Rajan AC, Mishra A, Satsangi S, Vaish R, Mizuseki H, Lee KR, et al. Machine-learning-assisted accurate band gap predictions of functionalized MXene. Chemistry of Materials. 2018;30(12):4031-4038
  75. 75. Hao Q, Xu D, Lu N, Zhao H. High-throughput ZT predictions of nanoporous bulk materials as next-generation thermoelectric materials: A material genome approach. Physical Review B. 2016;93:205206
  76. 76. Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O. The high-throughput highway to computational materials design. Nature Materials. 2013;12:191-201
  77. 77. Greeley J, Jaramillo TF, Bonde J, Chorkendorff I, Jens K. Computational high-throughput screening of electrocatalytic materials for hydrogen evolution. Nature Materials. 2006;5:909-913
  78. 78. Bhattacharya S, Chmielowski R, Dennler G, Madsen GKH. Novel ternary sulfide thermoelectric materials from high throughput transport and defect calculations. Journal of Materials Chemistry A. 2016;4:11086-11093
  79. 79. Liu Z, Fu B, Yi X, Yuan G, Wang J, Ferguson I. Co-doping of magnesium with indium in nitrides: First principle calculation and experiment. RSC Advances. 2016;6:5111-5115
  80. 80. Bishara D, Xie Y, Liu WK, Li S. A state-of-the-art review on machine learning-based multiscale modelling, simulation, homogenization and design of materials. Archives of Computational Methods in Engineering. 2023;30:191-222
  81. 81. Choudhary K, Garrity KF, Tavazza F. Data-driven discovery of 3D and 2D thermoelectric materials. Journal of Physics: Condensed Matter. 2020;32:475501
  82. 82. Graziosi P, Kumarasinghe C, Neophytou N. Material descriptors for the discovery of efficient thermoelectrics. ACS Applied Energy Materials. 2020;3:5913-5926
  83. 83. Fu C, Zhu T, Liu Y, Xie H, Zhao X. Band engineering of high performance p-type FeNbSb based half-Heusler thermoelectric materials for figure of merit zT > 1. Energy & Environmental Science. 2015;8:216e20
  84. 84. Wang ZL, Yokoyama Y, Onda T, Adachi Y, Chen ZC. Influence of algorithm parameters of Bayesian optimization, genetic algorithm, and particle swarm optimization on their optimization performance. Advanced Theory and Simulations. 2019;5:1900079
  85. 85. Wang B, Kucukgok B, He Q, Melton AG, Leach J, Udwary K, et al. Thermoelectric properties of undoped and Si-doped bulk GaN. MRS Online Proceedings Library. 2013;1558:903
  86. 86. Li J, Sui J, Pei Y, Meng X, Berardan D, Dragoe N, et al. The roles of Na doping in BiCuSeO oxyselenides as a thermoelectric material. Journal of Materials Chemistry A. 2014;2:4903-4906
  87. 87. Zhang D, Yang J, Jiang Q, Fu L, Xiao Y, Luo Y, et al. Improvement of thermoelectric properties of Cu3SbSe4 compound by in doping. Materials and Design. 2016;98:150-154
  88. 88. Zhou B, Li S, Li W, Li J, Zhang X, Lin S, et al. Thermoelectric properties of SnS with Na-doping. ACS Applied Materials & Interfaces. 2017;9:34033-34041
  89. 89. Hou Z, Takagiwa Y, Shinohara Y, Xu Y, Tsuda K. Fe–Al–Si thermoelectric (FAST) materials and modules: Diffusion couple and machine-learning-assisted materials development. ACS Applied Materials & Interfaces. 2019;11:11545-11554
  90. 90. Sasaki M, Ju S, Xu Y, Shiomi J, Goto M. Identifying optimal strain in Bismuth telluride thermoelectric film by combinatorial gradient thermal annealing and machine learning. ACS Combinatorial Science. 2020;22:782-790
  91. 91. Iwasaki Y, Takeuchi I, Stanev V, Kusne AG, Ishida M, Kirihara A, et al. Machine-learning guided discovery of a new thermoelectric material. Scientific Reports. 2019;9:2751
  92. 92. Murdock RJ, Kauwe SK, Wang AYT, Sparks TD. Is domain knowledge necessary for machine learning materials properties? Integrated Materials. 2020;9:221-227
  93. 93. Li F, Ruan M, Chen Y, Wang W, Luo J, Zheng Z, et al. Enhanced thermoelectric properties of polycrystalline BiCuSeO via dual-doping in Bi sites. Inorganic Chemistry Frontiers. 2019;6:799-807
  94. 94. Das S, Valiyaveettil S, Chen KH, Suwas S, Mallik R. Thermoelectric properties of Pb and Na dual doped BiCuSeO. AIP Advances. 2019;9:015025
  95. 95. Feng B, Li G, Pan Z, Hu X, Liu P, Li Y, et al. Enhanced thermoelectric performances in BiCuSeO Oxyselenides via Er and 3D modulation doping. Ceramics International. 2019;45:4493-4498
  96. 96. Fan L, Zhang G, Wang R, Jiao K. A comprehensive and time-efficient model for determination of thermoelectric generator length and cross-section area. Energy Conversion and Management. 2016;122:85-94
  97. 97. Wu Y, Yang J, Chen S, Zuo L. Thermo-element geometry optimization for high thermoelectric efficiency. Energy. 2018;147:672-680
  98. 98. Ferreira-Teixeira S, Pereira AM. Geometrical optimization of a thermoelectric device: Numerical simulations. Energy Conversion and Management. 2018;169:217-227
  99. 99. Ma X, Shu G, Tian H, Xu W, Chen T. Performance assessment of engine exhaust-based segmented thermoelectric generators by length ratio optimization. Applied Energy. 2019;248:614-625
  100. 100. Kim TY. Prediction of system-level energy harvesting characteristics of a thermoelectric generator operating in a diesel engine using artificial neural networks. Energies. 2021;14:2426
  101. 101. Wang P, Wang K, Xi L, Gao R, Wang B. Fast and accurate performance prediction and optimization of thermoelectric generators with deep neural networks. Advanced Materials Technologies. 2021;6:2100011
  102. 102. Kishore R, Mahajan R, Priya S. Combinatory finite element and artificial neural network model for predicting performance of thermoelectric generator. Energies. 2018;11:2216
  103. 103. Zhu Y, Newbrook DW, Dai P, de Groot CHK, Huang R. Artificial neural network enabled accurate geometrical design and optimisation of thermoelectric generator. Applied Energy. 2022;305:117800
  104. 104. Ang ZYA, Woo WL, Mesbahi E. Artificial neural network based prediction of energy generation from thermoelectric generator with environmental parameters. Journal of Clean Energy Technologies. 2017;5:458-463
  105. 105. He J, Liu Y, Funahashi R. Oxide thermoelectrics: The challenges, progress, and outlook. Journal of Materials Research. 2011;26(15):1762-1772
  106. 106. Wang N, He H, Ba Y, Wan C, Koumoto K. Thermoelectric properties of nb-doped srtio3 ceramics enhanced by potassium titanate nanowires addition. Journal of the Ceramic Society of Japan. 2010;118(1383):1098-1101
  107. 107. Buscaglia MT, Maglia F, Anselmi-Tamburini U, Marré D, Pallecchi I, Ianculescu A, et al. Effect of nanostructure on the thermal conductivity of la-doped srtio3 ceramics. Journal of the European Ceramic Society. 2014;34(2):307-316
  108. 108. Lan J, Lin YH, Liu Y, Xu S, Nan CW. High thermoelectric performance of nanostructured In2O3-based ceramics. Journal of the American Ceramic Society. 2012;95(8):2465-2469
  109. 109. Minnich A, Dresselhaus MS, Ren Z, Chen G. Bulk nanostructured thermoelectric materials: Current research and future prospects. Energy & Environmental Science. 2009;2(5):466-479
  110. 110. Juneja R, Yumnam G, Satsangi S, Singh AK. Coupling the high-throughput property map to machine learning for predicting lattice thermal conductivity. Chemistry of Materials. 2009;31:5145-5151
  111. 111. Zhang Y, Ling C. A strategy to apply machine learning to small datasets in materials science. NPJ Computational Materials. 2018;4(1):1-8
  112. 112. Chen L, Huan T, Batra R, Kim C, Ramprasad R. Machine learning models for the lattice thermal conductivity prediction of inorganic materials. Computational Materials Science. 2019;170:109155
  113. 113. Juneja R, Singh AK. Unravelling the role of bonding chemistry in connecting electronic and thermal transport by machine learning. Journal of Materials Chemistry A. 2020a;8(17):8716-8721
  114. 114. Juneja R, Singh AK. Guided patchwork kriging to develop highly transferable thermal conductivity prediction models. Journal of Physics: Materials. 2020;3(2):024006
  115. 115. Chakraborty P, Liu Y, Ma T, Guo X, Cao L, Hu R, et al. Quenching thermal transport in aperiodic superlattices: A molecular dynamics and machine learning study. ACS Applied Materials & Interfaces. 2020;12:8795-8804
  116. 116. Liu J, Han S, Cao G, Zhou Z, Sheng C, Liu H. A high-throughput descriptor for prediction of lattice thermal conductivity of half-Heusler compounds. Journal of Physics D: Applied Physics. 2020;53:315301
  117. 117. Wan J, Jiang JW, Park HS. Machine learning-based design of porous graphene with low thermal conductivity. Carbon. 2020;157:262-269
  118. 118. Du Y, Ying P, Zhang J. Prediction and optimization of the thermal transport in hybrid carbon-boron nitride honeycombs using machine learning. Carbon. 2021;184:492-503
  119. 119. Zhu Y, He R, Gong S, Xie T, Gorai P, Nielsch GJC. Charting lattice thermal conductivity for inorganic crystals and discovering rare earth chalcogenides for thermoelectrics. Energy & Environmental Science. 2021;14:3559-3566
  120. 120. Ju S, Yoshida R, Liu C, Wu S, Hongo K, Tadano T, et al. Exploring diamondlike lattice thermal conductivity crystals via feature-based transfer learning. Physical Review Materials. 2021;5:053801
  121. 121. Chowdhury PR, Ruan X. Unexpected thermal conductivity enhancement in aperiodic superlattices discovered using active machine learning. NPJ Computational Materials. 2022;8:12

Written By

Ebrar Yildirim and Övgü Ceyda Yelgel

Submitted: 10 September 2023 Reviewed: 24 September 2023 Published: 31 October 2023