Open access peer-reviewed chapter - ONLINE FIRST

Machine Learning in Estimating CO2 Emissions from Electricity Generation

By Marco Rao

Submitted: November 12th 2020Reviewed: March 26th 2021Published: April 12th 2021

DOI: 10.5772/intechopen.97452

Downloaded: 12


In the last decades, there has been an outstanding rise in the advancement and application of various types of Machine learning (ML) approaches and techniques in the modeling, design and prediction for energy systems. This work presents a simple but significant application of a ML approach, the Support Vector Machine (SVM) to the estimation of CO2 emission from electricity generation. The CO2 emission was estimate in a framework of Cost-Effectiveness Analysis between two competing technologies in electricity generation using data for Combined Cycle Gas Turbine Plant (CCGT) provided by IEA for Italy in 2020. Respect to other application of ML techniques, usually developed to address engineering issues in energy generation, this work is intended to provide useful insights in support decision for energy policy.


  • CO2 emissions
  • energy systems
  • machine learning
  • support vector machines
  • cost-effectiveness analysis
  • forecasting

1. Introduction

The science of decision support is foundational for every type of policy, and this work offer a proposal to analyze its role in energy policy.

An example of application of a particular machine learning (ML) technique to an energy policy problem is presented. It is important to understand the role of ML in energy and environmental analysis, for two solid reasons.

The first concerns the need to process large volumes of data and to elaborate and model complex relationships, typical of the energy analysis and of the environmental analysis. In this context, the use of AI (Artificial Intelligence) and machine learning is almost mandatory.

The second concerns the need to a concerted effort to identify how these tools may best be applied to tackle major problems of recent years, like climate change [1]: about this, CO2 emissions is key variable that we must control to achieve the global objective of mitigating damage for humanity.

This work has a specific goal. Using known tools from the scientific literature on energy generation costs, we intend to show how the use of a machine learning technique (the support vector machines, SVM) can produce a more accurate modeling of these costs.

The link with CO2 emissions is provided by the possibility of using the cost model in a cost-effectiveness analysis (C-E A), in which the cost is represented by the Levelised Cost of Energy (LCOE) and the effectiveness is represented by the CO2 emissions of the technologies considered per unit of energy produced.

The CO2 estimation is then obtained by selecting the best generation options according to the C-E A results.

The meaning of this work is the following.

Imagine that you are an energy analyst, in the public or private sector, and you need to use only one or just few variable/s (such as a forecast on the cost of natural gas), to estimate the costs of an electricity generation technology.

This task can be accomplished using a cost model of electricity generation in which a single piece of information can vary, leaving everything else unchanged (or imposing a certain trend on it).

The metric used is the indicator LCOE (Levelised Cost of Energy) provided by IEA (International Energy Agency), using 2020 data.

Once you have obtained a certain level of accuracy in estimate of energy cost, it is possible to move into a context of cost-effectiveness analysis, in which the best energy option in terms of Incremental Cost-Effectiveness Ratio (ICER) was selected to produce energy and, finally, provide a certain level of CO2 emissions for the time horizon in which such a technology is still the “best option”.

In other words, the estimate of energy cost and the cost-effectiveness analysis, allow us to trace the scenarios for electricity generation mix and, finally, calculate a quantitative forecast of the CO2 emitted.

The proposed work just intends to show the application of one of the existing machine learning techniques to the estimation of the LCOE, starting from some explanatory variables.

A linear model (LM) and an SVM are compared in the prediction of the LCOE value for a combined cycle gas plant (CCGT) with a focus on the fuel cost, Operation and Maintenance (O&M) cost and CO2 price using IEA data for Italy in 2020.

The work carried out intends to highlight the possibilities of applying machine learning techniques not only in the purely engineering aspects of energy systems, but also in the statistical-economic ones at a higher level of abstraction.

Some words about why to focus on power generation systems.

As countries work towards a low carbon world, it is crucial that policymakers, modelers, and experts have at their disposal reliable information on the cost of generation.

IEA [2] reports that the levelised costs of electricity generation of low-carbon generation technologies are more and more low the costs of conventional fossil fuel generation. Renewable energy costs continue their descent in recent years and their costs are now competitive with dispatchable fossil fuel-based electricity generation for many countries.


2. Methodology

This section presents the main tools used in this work: the LCOE methodology provided by IEA and the SVM, the used machine learning technique. Just before SVM presentations a very brief remind about ML and its use in energy systems and CO2 emissions estimates will be provided.

2.1 Levelised cost of energy

The Levelised Cost of Energy (LCOE) is the selected tool to measure the cost of an energy unit produced by the considered technologies. LCOE is a methodology described in the joint report by the International Energy Agency and the OECD (Organization for Economic Co-operation and Development) Nuclear Energy Agency (NEA) (now at the ninth edition in a series of studies on electricity generating costs) [1]. This report includes cost data on power generation from natural gas, coal, nuclear, and a broad range of renewable technologies.

The metric for plant-level cost chosen is the well-known levelised cost of electricity (LCOE) (IEA are now considering system effects and system costs with the help of the broader value-adjusted LCOE, or Levelised Cost of Value-Adjusted LCOE, VALCOE metric, here not considered).

The LCOE is widely considered as the principal tool for comparing the plant-level unit costs of different base load technologies over their operating lifetimes since indicates the economic costs of a technology family, not the financial costs of a certain projects in a certain market. Due to the equality between discounted average costs and the stable remuneration over lifetime electricity production LCOE recall the costs of electricity production in regulated electricity markets with stable tariffs than to the variable prices in deregulated markets.

Despite many limitations, LCOE has maintained its utility and appeal since it is a uniquely straightforward, transparent, comparable, and well understood metrics remaining a widely used tool for modeling, policy making and public debate.

The calculation of the LCOE is based on the equivalence of the present value of the sum of discounted revenues and the present value of the sum of discounted costs. Another way on the left-hand side one finds the discounted sum of benefits and on the right-hand side the discounted sum of costs:



PMWh  The constant lifetime remuneration to the supplier for electricity;

MWh  The amount of electricity produced annually in MWh;

1+rt The real discount rate corresponding to the cost of capital;

Capitalt Total capital construction costs in year t;

O&Mt  Operation and maintenance costs in year t;

Fuelt   Fuel costs in year t;

Carbont Carbon costs in year t;

Dt   Decommissioning and waste management costs in year t

PMWh  is equal to levelised cost of electricity (LCOE).

Eq. (1) is the formula used here to calculate average lifetime levelized costs based on the costs for investment, operation and maintenance, fuel, carbon emissions and decommissioning and dismantling provided by OECD countries and selected non-member countries.

2.2 Machine learning

Machine learning (ML) is the field of artificial intelligence (AI) that provide methods to learn from data over time creating algorithms not being programmed to do so.

The literature about ML is relatively recent but is so vast that only some hint to review works can be made here, as an access point to this world1.

Machine learning approaches are normally categorized as in the follows.

Supervised machine learning, that trains itself on a labeled data set; unsupervised machine learningthat uses unlabeled data with algorithms to extract the features required to label, sort, and classify the data in real-time, without human intervention; semi-supervised learning(SsL) namely a medium between supervised and unsupervised learning: SsL uses a smaller labeled data set during training and make classification and feature extraction from a larger, unlabeled data set; reinforcement machine learningis like supervised learning, but do not requires sample data for training (since using “trial and error” mode).

About the machine learning algorithms for use with labeled data the regression algorithms(as linear and logistic regression); decision trees(based on a set of decision rules to perform classification); instance-based algorithms: it uses classification to estimate how likely a data point is to be a member of one group, or another based on its proximity to other data points.

Methods based for use with on unlabeled data are: clustering algorithms: (like K-means, TwoStep, and Kohonen clustering); association algorithms: (that find patterns in data by identifying ‘if-then’ relationships namely association rules); neural networks: (that create a layered network of calculations featuring an input layer, when data in; one or more hidden layer, where calculations are performed; and an output layer. Where each conclusion is assigned a probability); deep neural networkthat uses multiple hidden layers, each of which successively refines the results of the previous layer. Deep learning models are typically unsupervised or semi-supervised. Certain types of deep learning models—including convolutional neural networks (CNNs) and recurrent neural networks (RNNs)—are driving progress in areas such as computer vision, natural language processing (including speech recognition), and self-driving cars.

In this work, the machine learning approach used is the SVM one.

SVMs2 are machine learning algorithms built on statistical learning theory for structural risk minimization. In pattern recognition, classification, and analysis of regression, SVMs outperform other methodologies. The significant range of SVM applications in the field of load forecasting is due to its ability to generalize (also, local minima lead to no problems in SVM).

SVM was chosen, in this work, for the sake of simplicity, since the performed Support Vector Regression (SVR) [5], extremely easy to understand in comparing a traditional statistical tool with a competing machine learning based one.

Often, the available applications of SVM in the energy sector are oriented on the engineering side3 while in this work the approach is oriented in support decisions for energy policy field.

Using one of the possibilities offered by SVMs, namely the SVR, the follows show how it is possible to obtain more accurate forecasts of costs per unit of energy produced, using LCOE as a metric.

The best available accuracy is then used in a context of cost-effectiveness analysis.

In the following, a method to select among competing options (options that can be differ even for slight changes in some significant LCOE parameters), the one characterized by the best Incremental Cost-Effectiveness Ratio (ICER) is presented.

The possibility of making this choice during the lifetime of the plant leads to the possibility of identifying the best technology available, year by year, to get the corresponding profile of the associated CO2 emissions.

2.2.1 Machine learning for energy systems and CO2 emission estimation

The growing utilization of data collectors in energy systems has resulted in a massive amount of data accumulated (an increasing mass of mart sensors are now extensively used in energy production and energy consumption) leading to a continuous production of big data and, consequently, to a massive number of opportunities and challenges in decision support science.

Today, ML models in energy systems are essential for predictive modeling of production, consumption, and demand analysis due to their accuracy, efficacy, and speed or to provide an understanding on energy system functionality in the context of complex human interactions.

[7] propose a comprehensive review of essential ML to present the state of the art of ML models in energy systems and discuss their likely future trends.

Machine learning was used for estimate CO2 emission from energy systems in several context, using different approach. It is possible to recall, among an increasing number of works in recent years:

[8] about flexibility of the electricity demand, a machine learning algorithm developed to forecast the CO2 emission intensities in European electrical power grids distinguishing between average and marginal emissions in Danish bidding zone DK2;

[9] an investigation on the causal relationship among solar and wind energy production, coal consumption, economic growth, and CO2 emissions for these three countries;

[10] on the linkage between energy resources and economic development the focus of that work is to develop and apply the machine learning approach to predict gross domestic product (GDP) based on the mix of energy resources with a higher predictive accuracy;

[11] about proposing a standardized framework for estimating the indirect building carbon emissions within the boundaries of various types of Local Climate Zones (LCZs using a random forest machine learning method);

[12] on the relationship among iron and steel industries, air pollution and economic growth in China (using a Long Short Term Memory, LSTM, approach);

[13] on the forecasting of energy consumption related carbon emissions for the Beijing-Tianjin-Hebei region.

[14] on the uses of gray relational analysis to identify the factors that have a strong correlation with carbon emissions for China to reduce carbon emissions by studying prediction of carbon emissions (using LSTM).

[15] on the creation of an automated, high-resolution forest carbon emission monitoring system that will track near real-time changes and will support actions to reduce the environmental impacts of gold mining and other destructive forest activities for the Peruvian Amazon (using deep learning models).

[16] on the use of a random forest machine learning regression workflow to map country of Peru by combining 6.7 million hectares of airborne LiDAR measurements of top-of-canopy height with thousands of Planet Dove satellite images into, to create a cost-effective and spatially explicit indicators of aboveground carbon stocks and emissions for tropical countries as a transformative tool to quantify the climate change mitigation services that forests provide.

[17] To determine whether China can achieve the commitment of reducing carbon emission intensity in 2030, through a general regression neural network (GRNN) forecasting model based on improved fireworks algorithm (IFWA) optimization is constructed to forecast total carbon emissions (TCE) and carbon emissions intensity (CEI) in 2016–2040.

2.3 Our methodology

The present work reports an experiment performed using a simple LCOE model, built according to basic methodology proposed by IEA. The performed experiment is simple and straightforward. Two energy scenarios were produced, one based on a certain hypothesis of change in the fuel cost, the other based on a hypothesis of change in fuel cost, O&M cost, and CO2 price, for the CCGT type plant, over a period of 30 years.

In each scenario, a certain LCOE profile is obtained for the time horizon considered. A simple regression analysis is then performed on this variable, using as explanatory variables, first the cost of fuel, and then the operating costs.

The analysis is carried out both using a LM and the SVM, with further manual tuning of the last to improve its performance. The manual tuning for SVR was used for the sake of simplicity since the main goal of the study is to suggest the application of this ML technique to gain forecasting accuracy to use in the following phase, the cost-effectiveness analysis.4

To evaluate the accuracy of the forecast, the Root Mean Square Error (RMSE), the Mean Average Error (MAE) and the Mean Average Percentage Error (MAPE) were used.5

This simple test was performed to show the accuracy of the fuel cost and O&M cost as a predictor of CCGT LCOE.

Once established the best technique, the data from the two scenarios in a third scenario are modified, under certain hypothesis explained in the follows, to made a C-E A between a technology represented by IEA data and another of the same type with little changes in O&M costs. Using ICER as a winning criterion, it is possible to select the best energy generation option and, finally, to trace the corresponding CO2 emission estimate trend over the plant’s lifetime.

All the data coming from IEA [2].

The LCOE model.

First, a LCOE model based on IEA Eq. (1), with the following level of detail, was built.

The basic relationships of the model are:



CC       Cost of Capital (USD/MWh)

Power      net capacity (MWe)

AVLFmin    AVerage Load Factor min value (%)

AVLFmax    AVerage Load Factor max value (%)

AAF      Average Availability Factor (%)

AuxP      Auxiliary Power (%)

Lifetime     Time horizon of plant (years).

wdmin     min weight of cost of debt on total cost (%)

wdmax     max weight of cost of debt on total cost (%)

kdmin     min value of debt rate (%)

kdmax     max value of debt rate (%)

tmin      min value of taxation (%)

tmax      max value of taxation (%)

krftmin    min value of free risk rate (%)

krftmax    max value of free risk rate (%)

EMRPmin   min value of Expected Market Risk Premium (%)

EMRPmax   max value of Expected Market Risk Premium (%)

Bmin      min value of Beta (%)

Bmax      max value of Beta (%)

CnsTmin    min value of Construction Time (years)

CnsTmax    max value of Construction Time (years)

FOMmin    Fixed Operation and Maintenance Costs min (USD*MWh)

FOMmax    Fixed Operation and Maintenance Costs max (USD*MWh)

VOMmin    Variable Operation and Maintenance Costs min (USD*MWh)

VOMmax    Variable Operation and Maintenance Costs max (USD*MWh)

Cfuemin    min value of Costs of Fuel (USD*MWh)

Cfuemax    max value of Costs of Fuel (USD*MWh)

Effmin     min value of Efficiency (%)

Effmax     max value of Efficiency (%)

PCO2min    min value of CO2 price (USD*MWh)

PCO2max    max value of CO2 price (USD*MWh)

Decommin   min value of Decommissioning (USD*MWh)

Decommax   max value of Decommissioning (USD*MWh)

All other parameters are settled using the IEA values.

We have set two type of scenario, basing on the following assumptions about certain variables of the model. The basic hypothesis is a constant decreasing of 2% for every variable changed, except every 6 years (a totally arbitrary choice), simulating an increasing amplification of this cycle (every 6 years, the percentage variation of the cost respect to the previous value is double than it and then is multiplied for the number of the occurring, so the first time at year 6, this value is roughly 4, namely 2% multiplied by 2 and then multiplied per variation 1).

Table 1 describes the hypothesis used in this first step of the analysis.

Fuel Cost (baseline 45.5 USD/MWh)O&MCost (baseline: 6.99 USD/MWh)CO2 price (10.1 USD/MWh)
Scenario 1Linear decreasing of 2% per year except every 6 yearsconstantconstant
Scenario 2Linear decreasing of 2% per year except every 6 yearsLinear decreasing of 2% per year except every 6 yearsLinear decreasing of 2% per year except every 6 years

Table 1.

Scenarios used for the regression of LCOE on fuel cost and O&M cost

3. Results

Figure 1 shows the results obtained by performing a SVR about the data from IEA [1] for the first scenario considered (Figure 2).

Figure 1.

Comparison between LM and SVMBT in predicting LCOE of CCGT technology for Italy (simulating data over lifetime of the plant - base data: Italy, 2020 - sources: IEA) - scenario 1 - Y = LCOE (USD/MWh), X = fuel cost (USD/MWh).

Figure 2.

Comparison between LM and SVMAT in predicting LCOE of CCGT technology for Italy after tuning (simulating data over lifetime of the plant - base data: Italy, 2020 - sources: IEA) - scenario 1 - Y = LCOE (USD/MWh), X = fuel cost (USD/MWh).

The values of RMSE for the Linear Model (LM), the SVM Model Before Tuning (SVMBT) and the SVM Model After Tuning (SVMAT) are:

Linear Model1,30E-148,39E-158,39E-17
Tuned SVM1,74E-031,54E-031,54E-05

with a clear improvement of performance of the SVM after tuning. The linear model since the strong relationships between the fuel cost and the LCOE is clearly preferable respect to the SVM (Figures 1 and 2).

Figure 3.

Comparison between LM and SVMBT in predicting LCOE of CCGT technology for Italy (simulating data over lifetime of the plant - base data: Italy, 2020 - sources: IEA) - scenario 2 - Y = LCOE, X = O&M Cost.

Figure 4.

Comparison between LM and SVMAT in predicting LCOE of CCGT technology for Italy after tuning (simulating data over lifetime of the plant - base data: Italy, 2020 - sources: IEA) - scenario 2 - Y = LCOE, X = O&M Cost.

The values of RMSE for the Linear Model (LM), the SVM Model Before Tuning (SVMBT) and the SVM Model After Tuning (SVMAT) are:

Linear Model3.87E+002.70E+002.70E-02
Tuned SVM2.61E+001.45E+001.45E-02

Recalling that in the second case the O&M cost was used as a predictor, we can more appreciate the gain in terms of RMSE obtained by using the SVM.

The increasing accuracy of the SVR respect to the LM, can be used to perform a CO2 emission estimation in a cost-effectiveness analysis.

Let us look at a simple and plain experiment based on IEA data [2] for Italy, 2020 in the following scenario:

Fuel Cost (baseline 45.5 USD/MWh)O&MCost (baseline: 6.99 USD/MWh)CO2 price (10.1 USD/MWh)
Scenario 3Decreasing of 15% at 15th year then linear decreasing of 1% until rest of the lifetime.Decreasing of 15% at 15th year then linear decreasing of 1% until rest of the lifetime.Decreasing of 15% at 15th year then linear decreasing of 1% until rest of the lifetime.

In scenario 3 we made a simulation basing on the hypothesis of a sudden shock for the three variables above reported in the 15th year, immediately followed by a linear decrease of them until end of the lifetime, starting from IEA 2020 data as a baseline value.

For scenario 3 the errors in predicting LCOE using O&M Cost over the considered time horizon are:

Linear Model4.258783.491470.03491
Tuned SVM2.585411.523780.01524

In Cost-Effectiveness Analysis it is possible to calculate the Incremental Cost-Effectiveness Ratio (ICER), used as a measure of cost the LCOE and used as a measure of effectiveness through the quantity of CO2 emitted. The ICER can be used as a selection criterion between different options then, the winning options will be producing a certain level of emissions.

Now, let us imagine comparing two types of plants of the same technological family, in this case the CCGT. In this hypothetical exercise, the second type of plant is characterized by higher operating costs (+5% of the IEA base value).

In addition to this, let us imagine that the second type of plant has an average load factor of 94%.

Now, let us repeat the simulation performed for scenario 3 for the first type of CCGT plant (the real one), but only from the 20th year.

The meaning of this operation is as follows:

  • to use systems with different characteristics (in this case we have changed the O&M costs and the load factor of a single technology family);

  • to calculate the ICER corresponding to each plant in a defined time interval (in this case, from when the LCOE starts to vary);

  • to calculate the degree of uncertainty on the value of the ICER thanks to the MAPE of the SVR, defining the variation range for the ICER6;

  • to select the technology that has the lowest ICER and then we calculate the corresponding emissions over the time horizon considered;

  • finally, to calculate the emissions profile corresponding to the winning technology, year by year.

The results are shown in Figure 5.

Figure 5.

CO2 emissions from different kind of CCGT plants in scenario 3 (sources: IEA, 2020 + imaginary data).

Figure 5 illustrates what happens using the ICER criterion as a selector of the winning generation option. For the first 20 years, the first type of installation is selected, and the corresponding emissions are those of the blue line. From 20 years of age onwards, using the ICER as a criterion means choosing the second type of plant and the curve that shows the new profile of the emissions is the orange one.

4. Conclusions

ML can help in providing accurate forecasts of CO2 emissions from power generation, especially when we face simultaneous variation of major driver (like fuel cost, operating cost of the plant and so on); only a little piece of the possible comparisons between traditional techniques and a particular ML method was shown, focusing on the better performance of the ML one (SVM) respect to the traditional one (the LM).

In our case, the performed step was:

  1. improving LCOE forecasting performance,

  2. comparing multiple competing options by use of the ICER in Cost-Effectiveness Analysis;

  3. consider the uncertainty about ICER using the MAPE (in this case, but is just an option) calculated by SVM;

  4. choosing the best technology and calculating the CO2 emissions for it;

  5. defining the trend of the CO2 emissions in the lifetime of the plant by step 4.

Recalling that a basic LCOE model can be brought to a great level of granularity, it is easy to imagine how this type of analysis could gain in depth and significance if the required data are available. Indeed, also in case of missing data, significant simulation can be provided by using each available piece of information on energy costs.

The experiment performed was conducted at the highest level of simplicity to better focus on the reasons that suggest ML integration not only about the engineering features of electricity generation field but also in support decision tools about energy policy.

Conflict of interest

The authors declare no conflict of interest.


  • Here we just remind a recent review of the state of art in machine learning techniques [3].
  • For a good introduction to this topic see [4].
  • See, for example [6].
  • Indeed, manual tuning is often considered as one of the most significant choice [18].
  • See [19] for a complete discussion about the used metrics.
  • Namely, ICER max/min = ICER +/− ICER*MAPE.

Download for free

chapter PDF

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Marco Rao (April 12th 2021). Machine Learning in Estimating CO<sub>2</sub> Emissions from Electricity Generation [Online First], IntechOpen, DOI: 10.5772/intechopen.97452. Available from:

chapter statistics

12total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us