Open access peer-reviewed chapter

Application of Central Composite Design with Design Expert v13 in Process Optimization

Written By

Chigoziri N. Njoku and Samuel K. Otisi

Submitted: 15 July 2022 Reviewed: 23 December 2022 Published: 23 January 2023

DOI: 10.5772/intechopen.109704

From the Edited Volume

Response Surface Methodology - Research Advances and Applications

Edited by Palanikumar Kayarogannam

Chapter metrics overview

523 Chapter Downloads

View Full Metrics

Abstract

This chapter is focused on the study application of central composite design, in response surface methodology. We have reviewed this concept and applied it to optimize Biodiesel yield from transesterification of methanol and vegetable oil with a catalyst derived from eggshell using design expert 13. This optimization was carried out with reaction conditions of reaction time, methanol to oil ratio, catalyst loading, and reaction temperature. Data used as an instance was collected and analyzed from the work of Tshizanga et al. and the result obtained for a randomized experiment showed at a 95% confidence level that all the factors affected the product’s output. About 91% yield was obtained and operating parameters were optimized at a temperature of around 61%. Methanol to oil ratio of 22.13, and catalyst loading of around 3.7 wt%. This chapter provided a step-by-step guide on how to carry out this experiment using design expert 13, a reduced Quadratic model with a significant P-value of 0.0325 shows the model is significant, as indicated by an f-value of 3.57. An F-value might be caused by noise only in 3.25% of cases. The run was reduced to 18 compared to the 20 runs originally used by Tshizanga et al.

Keywords

  • response surface methodology (RSM)
  • central composite design (CCD)
  • design of experiment (DOE)
  • design expert

1. Introduction

It has always proven difficult to quickly select an appropriate experimental design, which can simply explicate many response factors. This sometimes leads to a quadratic surface model. CCD can be a choice for this kind of model. An experimental design called the central composite design (CCD) concept has emerged and has been very handy as part of the optimization process and search for the ideal product from ongoing batches. In statistics, a central composite design is an experimental kind of design, helpful in response surface methodology, for creating a second-order (quadratic) model for the response variables without having to use a complete three-level factorial experiment [1]. After performing the designed experiment, linear regression is deployed, sometimes iteratively, to obtain results. Coded Variables are frequently utilized when creating this design Most optimizations are done by screening all the potential variables [2]. Here, all the possible independent factors are first identified, and these factors are further improved before response surface methodology can finally be used to establish relationships between one or more process variables and their responses. The Central composite design is sometimes referred to as Box-Wilson central composite design and it has been chosen among researchers due to its accuracy.

Advertisement

2. Key terms in central composite design

Some important keywords will be mentioned throughout this chapter. This is to equip the readers with the terminology to understand fully the concept of Response surface methodology.

Response surface: These are the related variables. It involves a two or three-dimensional plot of the results of experimental data. Response surface methodology (RSM) is used to describe the use of experimental designs that give response surfaces from which information about the experimental system is deduced [3].

Factor: This can also be called the parameter or predictor. It is an entity that controls an outcome. The output Change is brought about by the manipulation or tweak of the input factor (s). They can be set and reset at different levels depending on the needs and conditions that affect the experiment.

Levels of the factors: The Design of experiments is named by the number of levels chosen for a factor, it could be a two or three-level design. It signifies the value of a factor that is prescribed in an experimental design. Levels could be high, mid, or low (three-level design) and only high and low (two-level design) is often coded as +1(high), 0(mid), and − 1(low). Selecting levels for an experiment often requires field experience. For example, for a three-level experiment, selecting the levels in a reactor would require some previous experience to decide 30°C(−1), 40°C (0) and 50°C (1) are suitable for low-level, mid-level, and high level respectively.

Blocking: This tool is used to eradicate the effects of external disturbances and in the process improve the efficiency of experimental design. External disturbances cause different forms of variations. The main goal is to arrange similar experiments runs into one group, so that the whole group becomes a homogeneous unit. For example in the transesterification reaction, A researcher is attempting to increase the yield of Biodiesel through Mean Absolute Errors (MAE). factors were considered for the initial experiment trials, which might have some impact on the yield. It is decided to study each factor in a two-level setting (i.e. a low value and a high value). Six experimental trials are chosen by the experimenter, but only four trials are possible to run per day. Here, each day can be handled separately as a different block [4].

Response: This is the result of the effect of an experiment, which is observed on account of changing the values of the predictors. For example the Yield, Selectivity, or Conversion of a reactant in a reactor.

Design of experiment (DOE): This is a statistical approach that involves planning, analyzing, conducting, and interpreting data obtained from experiments [3].

Randomization: While designing and running an experiment, there are several factors in the form of external disturbances often known as noise factors, which may influence how the experiment turns out. For example, variations in the quality of the raw material due to seasonal change, variations in the temperature, and their effects on the overall reaction yield may affect the result and such factors are difficult to control. Randomization is one of the methods to remove or reduce such errors occurring due to uncontrollable factors. Randomization helps in calculating the cumulative impact of the external disturbances if present in the process [3].

Model: This is an equation expressing the relationship between responses and the factors under study or investigation. Here the outcome can be denoted as a function of the experimental factors. For example, a model that has only one parameter xcould be expressed as;

y=fx+E1

For two parameters model, it could represent as;

y=fx1,x2+E2

For the n parameters model, consider the following equation;

y=fx1,x2..xn+E3

The function, fx denotes the relationship between the parameters and the response (y) with the residuals (ε) and is depicted through a polynomial equation. Three different models are described:

Linear model: This is the simplest polynomial model that contains only linear terms and describes only the linear relationships between the variables and the responses. A linear model with two factors x1, x2 are expressed as:

y=b0+b1x1+b2x2+E4

Or can generally be represented as;

y=b0+i=1kbixi+E5

Here, y is the outcome, bi is the model coefficients, b0 is the model intercept, i is the factor number from i to k, and xi is the independent variables.

Interaction model: The interaction model holds some extra terms that depict interactions between various variables if any. For a two-factor, It is denoted as;

y=b0+b1x1+b2x2+b12x1x2+E6

Or generally as;

y=b0+i=1kbixi+i=1k=1j=i+1kbijxixj+E7

bo, b i, and bij are the regression or the model coefficients for intercept, linear, and interaction terms, respectively, and xi, and xj are reaction factors.

Quadratic model: Quadratic terms are introduced in the model to help ascertain the optimal value. It helps to identify curvature that exists in the model. This model for two factors and interaction can be represented below:

y=b0+b1x1+b2x2+b12x1x2+b11x12+b22x22+E8

or generally a;

y=b0+i=1kbixi+i=1kbiixi+i=1k=1j=i+1kbijxixj+E9

bo, b i, bii, and bij are the model coefficient for intercept, linear, quadratic, and interaction terms, respectively, and xi, and xj variables.

Note: The Symbol , in the model for eqs. (1) to (9) represents the residuals and the linear and interaction models are used during the screening stage.

Effects: This is often regarded as the coefficient of the variables, it can be distinguished from the main effects; which involve the factor’s coefficient in the first-order model. Interaction effect; It is the coefficient of the products of linear terms. Quadratic effect; which denotes the coefficient of the square of the linear terms.

Replication: Replication means repeating the entire experiment or a part of it, under different operating conditions. It helps to obtain a projection of the experimental errors and to understand and estimate more specifically the factors and their interaction.

Advertisement

3. Response surface methodology for optimization design

The primary goal 0f optimization design is to minimize unfavorable or undesired outputs or maximize the desired outputs. Sometimes, simple linear and interaction models are not enough to provide a brilliant picture of the process. For this study, our goal is to increase the Biodiesel Yield from the transesterification of methanol and vegetable oil using a catalyst derived from the eggshell. The experiment has already been done and data is provided in this reference [4]. We will be using the information from this work to provide a thorough examination of central composite design in process optimization. The variables are reaction temperature, methanol-to-oil ratio, and catalyst weight. If these entities are positioned inside the region in which the experiment is to be conducted, we need a mathematical model that can represent curvature so that it has a local optimum. The best model is the quadratic model as shown in (eq. 9), which contains linear terms for all factors, squared terms for all factors, and products of all pairs of variables. Response surface designs are generally used for fitting quadratic models. A full factorial design with three levels for each input variable is one such design. Due to the excessive number of runs, that is not necessary to fit the model. It is typically not a good design. The CCD and Box–Behnken designs are the two most common designs generally used in response surface modeling although only central composite designs are explored in detail. In these types of designs, the variables take on three or five distinct levels, but not all combinations of these values appear in the design. The steps in CCD for Optimization are outlined below:

Preliminary stage: Here, the following steps are done:

  • Choosing the factors and desired levels

  • Determination of the Counts of experimental runs

  • Calculation of alpha (α) and the axial values

  • Selecting the response variables

  • Carrying out the experiments

  • Model selection

Analysis stage: At this stage, the following are done:

  • ANOVA is conducted where the F-test and Lack of fitness are used to test for significance, Adjusted and predicted R2 are also determined at this stage.

  • Next, the model equation is built

  • Comparing values predicted (from the model) and actual values

  • Using 2D and 3D contour plots or graphs to visualize the response(s).

Decision-making stage: Here, the predicted and actual values are compared to determine the residuals using some useful parameters such as Adjusted R-Square, Mean Absolute Error (MAE), or Mean Square Error (MSE) is employed to assess the model performance and if okay with the result we can proceed to the final stage but if not we go back to the preliminary stage to see how we can adjust the model.

Optimization stage: At this stage, the model is ready to be deployed for the optimization process, design expert version 13 is very handy for this entire process, all we need is to specify the required values. The detail on how to determine the CCD components will be done later in this chapter.

Advertisement

4. Box: Behnken design (BBD)

The box design can adapt to the response surface full quadratic model [5]. BBD has no incorporated factorial or fractional factor designs, such as CC. In this design, the treatment combinations are at the midpoints of the edges of the cube and the center as shown in Figure 1. BBD is a rotatable design and needs three levels for each factor. BBD should be considered for experiments with more than two factors, and when it is expected that the optimum is known to lie in the middle of the factor ranges. A, B, and C represent factors A, B, and C respectively.

Figure 1.

A representation of box–Behnken design.

Advertisement

5. Central composite design

Central composite design (CCD): This is a unique kind of response surface design that can fit a full quadratic model. It is comprised of factorial also known as fractional factorial design with a center point attached to a group of stars or axial points. Using the included axial points is an effective method for calculating the coefficients of a second-degree polynomial for the factors [6]. A CCD can be denoted as a square (for two factors design) or a cube (for a three factors design) having corners, which represent the levels (high and low represented as +1, −1 respectively), a star or axial points along the axes at or outside the square helps to account for the curvature and a center point at the origin. The general model for a two-factor full factorial CCD is represented graphically in Figure 2 below.

Figure 2.

A visual depiction of the CCD model for determination of total runs for all experiments for two factors full factorial design. K in the model is the number of factors, C is the replicated central points that help to eliminate pure error and N is the experiment runs required for the design.

Figure 3 displays a three-factor lay out for a CCD made up of a full factors factorial that forms the cube where each side is coded −1 and + 1 just like in Figure 2 above. The Stars stand for axial points and alpha is the distance from the edge of the cube to the stars.

Figure 3.

A graphical representation of three factors in a full factorial design.

Advertisement

6. Types of central composite design

There are three types of CCD namely:

  • Circumscribed Central Composite Design (CCC)

  • Inscribed Central Composite Design (CCI)

  • Face-Centered Central Composite Design (CCF)

The CCC is a type of CCD in which the location of the axial points forms new extremes from the already attained levels of the factorial factors. The new extremes are determined by a value called alpha (distance between the new extreme and the edges of the factorial points) making it up to 5 levels. It is often determined to achieve a rotatable design [7].

The CCI type is a modified form of CCC. The axial points are scaled to be within the limits of the factorial factor [8]. The CCI is also a rotatable type and has 5 levels just like the CCC.

For the CCF, the axial points correspond to the center points for each side of the cube in Figure 3 above (three factors designs) and they are non-rotatable [9]. It has only 3 levels. Figure 4 below provides more insight into this type of CCD.

Figure 4.

Three types of central composite Design6.

Advertisement

7. Determining the components of central composite design

Before starting the CCD optimization process, we need to provide a walk-through on how to calculate all the required parameters to build the model.

Advertisement

8. Calculating the number of experiment runs

To design a CCD experiment for two levels (+1 and − 1 levels of factors) full factorial design is represented by 2k, then the axial points as represented by Figure 2 are given as 2 k, let C represent the center points and n, the number of times the experiment is repeated to eliminate errors. Then the total number of experiment runs is given as:

N=2k+2k+nCE10

Where k is the number of factors selected for the experiments. For our case, we have three (3) factors, i.e. Temperature, methanol-ratio, catalyst weight, and 4 repetitions. By substituting k = 3, C = 1, and n = 4 (i.e. 4 repetitions), then N = 18 runs. Luckily, design expert 13 will automatically generate this value once the number of factors and repetitions are provided. Keep in mind the number of center points can also be adjusted by clicking the options button on the software, in this case, we will just use.

Advertisement

9. Calculating alpha (α)

As can be seen that immediately after the factors, n, and C are provided the alpha is automatically calculated, this is because the minimum parameters to calculate it has been specified, now it will be shown how the program generates this value. As has been discussed earlier Alpha is the distance between the new extreme axial points and the edge formed from the factorial levels. Now the following equation will calculate this α value for any factor.

α=2k1/4E11

For our case, k is 3, and therefore α = 1.68179 which is in line with the value created by the software. Consider the Table 1 below for k from 2 to 5 factors and their corresponding values.

Factor (k)Alpha (α)
21.41421
31.68179
42
52.37841

Table 1.

Factors and corresponding α values.

Advertisement

10. Calculating axial values

Before determining the axial points, the table below shows the factors levels and center points that will be used to compute the axial points. The center points are coded as 0, while low and high levels are designated as −1 and + 1 respectively. Also keep in mind that the experiment has already been performed and data provided from the work of (Table 2) Tshizanga et al., [4].

FactorsSymbolsLow level (−1)Centre point (0)High level (+1)
Temperature (°C)X1606570
Methanol to Oil ratioX215:122.5:130:1
Catalyst Weight (wt%)X32.03.55

Table 2.

Experimental ranges of the independent variable.

To compute the Axial values, the first thing to do is to find the α that can be added or subtracted from the factor levels (low and high) and the center points. Adding α to factor levels can be coded as + α (higher axial value) while subtracting α from the mean factor levels is however coded as – α (lower axial value). These additional two coded values (+α and – α) are axial and they make the factors up a total of 5. The two equations are given below:

+α=X+αxHigh levelLowlevel2E12
α=XαxHigh levelLowlevel2E13

Where α can be found using (eq. 11) although calculated as 1.68179 and X is given by:

Lowlevel+centre point+High level/kE14

And k is the number of variables, in this case, k is 3. At this point let us get our hands dirty with calculating the values for these 3 factors.

For Temperature:

X1=60+65+703=65
+α=65+1.68179x70602=73.4090°C (app. To 4 d.p)
α=651.68179x70602=56.5911°C

For Methanol-Oil ratio:

X2=15+22.5+303=22.522.5:1
+α=22.5+1.68179x30152=35.113435.1134:1
α=22.51.68179x30152=9.88669.8866:1

For Catalyst weight:

X3=2+3.5+53=3.5
+α=3.5+1.68179x522=6.0227wt%
α=3.51.68179x522=0.9773wt%

Currently, we have succeeded and step by step discussed how the software generated the alpha (α) and axial values as the components of the CCD, below is a table including these axial points (Table 3).

FactorsSymbolsLower axial point (−α)Low level (−1)Centre point (0)High level (+1)Higher axial point (+α)
Temperature (°C)X156.591160657073.4090
Methanol to Oil ratioX29.886615:122.5:130:135.1134
Catalyst Weight (wt%)X30.97732.03.556.0227

Table 3.

Experimental ranges of independent variables including calculated axial (star) values.

Upon specifying the required parameters for the CCD model the software will generate a table where the experiment will now be conducted to enable determining the response for each experiment run. For this case study, our response is biodiesel yield which can be determined from methyl ester and waste vegetable oil weight using the following equation:

Yield%=WeightBiodieselWeightOilx100E15

Table 4 factors’ coded values organized in the standard order.

StdRunFactor 1 A:Temperature (0C)Factor 2 B:Methano-Oil r…Factor 3 C:Catalyst Weight wt%Response 1 Biodiesel Yield %
13−1.000−1.000−1.000
2151.000−1.000−1.000
311−1.0001.000−1.000
441.0001.000−1.000
52−1.000−1.0001.000
6121.000−1.0001.000
717−1.0001.0001.000
881.0001.0001.000
97−1.6820.0000.000
1011.6820.0000.000
11160.000−1.6820.000
12140.0001.6820.000
13130.0000.000−1.682
1460.0000.0001.682
1590.0000.0000.000
16180.0000.0000.000
1750.0000.0000.000
18100.0000.0000.000

Table 4.

Factors’ coded values organized in the standard order.

Immediately after we fill up the required CCD components, the design expert will provide a table for coded levels of factors. This is being used to use as a guide to specifying the actual values and their corresponding responses. The experiment will be repeated four (4) times instead of 6 times (as done by the original researchers) to reduce the experiment runs. The two results will be compared after the optimization stage. Table 4 shows the coded factors, and Table 5 shows the actual values and their responses after experimenting in the laboratory.

StdRunFactor 1 A:Temperature (0C)Factor 2 B:Methanol-Oil ratioFactor 3 C:Catalyst Weight wt%Response 1 Biodiesel Yield %
136015269.29
2157015235.2
3116030273.39
447030210.66
526015561.29
6127015550.23
7176030571.48
887030522.06
9756.59122.53.562.18
10173.40922.53.579
1116659.886553.542.68
12146535.11343.539.5
13136522.50.97731166.52
1466522.56.0225974.04
1596522.53.580.46
16186522.53.589.35
1756522.53.590.98
18106522.53.589.52

Table 5.

Actual factors’ values arranged in the standard order after the experiment.

At this point, we can now replace the coded values with the actual values from the previous calculations. The factor columns were generated with a particular pattern, but it’s beyond the scope of this chapter, to learn more about this we recommend reading “RSM simplified by Anderson and Whitcomb” [10].

11. Results and analysis

Note: When entering the values for the methanol to oil ratio in design expert, you can ignore all the 1’s, since the value of Oil concerning Methanol is always a unit for all the experiment runs.

We can now delve into understanding the data collected to build the model, perform analysis, and finally carry out the optimization. All these steps will be done in design expert software.

12. Understanding the data

The reason for this is basically to understand the relationship that exists in the data. In a more statistical sense, we need to know if there is a strong correlation between the variables and the response. If to some extent there exists an intra-correlation among the factors then one of them has to be removed because it will eventually harm the model. Design expert has provided a wonderful dashboard where we can carefully learn more about the data we have collected and make some sense of it. At the left of the software, we will see the information part of it. The summary, graphs columns, and evaluation subsections are the places to dig the nuggets from the data.

In the summary section we will see the summary statistics of the data, i.e. the number of experiment runs, type of designs and model, minimum, maximum, mean, standard deviations of the responses, and Ratio of maximum to minimum response values (Table 6).

FactorNameUnitsTypeSubTypeMinimumMaximumCoded LowCoded HighMeanStd. Dev.
ATemperature(0C)NumericContinuous56.5973.41−1 ↔ 60.00+1 ↔ 70.0065.004.24
BMethanol-Oil ratioNumericContinuous9.8935.11−1 ↔ 15.00+1 ↔ 30.0022.506.36
CCatalyst Weightwt%NumericContinuous0.97736.02−1 ↔ 2.00+1 ↔ 5.003.501.27

Table 6.

Summary statistics of the factors.

We have seen that the mean response is quite far away from the minimum and maximum response, this is the primary reason for building this model to test the statistical significance of this result. If we are okay with the significance we can go ahead with the model built in the evaluation tab (Table 7).

ResponseNameUnitsObservationsMinimumMaximumMeanStd. Dev.Ratio
R1Biodiesel Yield%18.0010.6690.9861.5523.493.53

Table 7.

Summary statistics of the response.

Moving over to the graphs column section, there are scatter plots, histograms, and Box-plots. To make the most sense of this data, the scatter plot is most handy since it tells how the factors are correlated to each other, the drop-down at the top left helps to select the factors to show scatter plots or the correlations plots at the bottom to help display correlations as values between −1 and 1 (blue to red). Values close to −1 show a strong negative correlation and values close to +1 show a strong positive correlation. Now we will go ahead and display the scatter plots of each factor and the response under the factor that mostly affects the biodiesel yield in Figures 57 respectively.

Figure 5.

A scatterplot of temperature vs. biodiesel yield.

Figure 6.

A scatterplots of methanol/oil ratio against biodiesel yield.

Figure 7.

A scatterplots of catalyst weight against biodiesel yield.

The plots have shown that temperature mostly negatively affects yield. The correlation plot in Figure 7 confirmed this claim since the box with the most blueish color lies between the temperature and Biodiesel columns. As described in the correlation plot in Figure 8 from design expert software.

Figure 8.

Correlation plots of all the relationships that exist in the data.

Finally, this section provides a unique tab called evaluation, where the model name is selected and all their parameters are shown. In this case, a quadratic model has been selected by the software which is the best for CCD. There are two tabs i.e. the results and graphs where the model parameters are evaluated.

In the model tab, the model terms are related to the Significant factor, Variance Inflation Factor, R-Squared, and Power of the model as shown in the table below.

12.1 Model terms

Power calculations are performed using response type “Continuous” and parameters:

Delta = 2, Sigma = 1.

Power is evaluated over the −1 to +1 coded factor space. Standard errors should be similar to each other in a balanced design. Lower standard errors are better.

The ideal VIF value is 1.0. VIFs above 10 should cause concern. VIFs above 100 should cause alarm, indicating coefficients are poorly estimated due to multicollinearity.

Ideal Rᵢ2 is 0.0. High Rᵢ2 means terms are correlated with each other, possibly leading to poor models. If the design has multilinear constraints, then multicollinearity will exist to a higher extent. This inflates the VIFs and the Rᵢ2, rendering these statistics useless. Use FDS instead.

The Power Calculation is the estimated chance to find a significant effect out of the current evaluation model. Power depends on the size and structure of the design, the signal-to-noise ratio (number of standard deviations) for the effect, and the model evaluated. The Options button on the Model tab allows the user to define three signal-to-noise ratios that define the number of standard deviations to use. If the power is not large enough (80% or more) for a reasonably sized effect, then the design is underpowered. As can be seen in Table 8, we may consider removing the interaction terms since they have lower power. This will be done after analyzing the model and the p-value is higher than 0.05. This means they have affected the performance of the model. Power is an inappropriate tool to evaluate response surface designs. Use prediction-based metrics provided in this program via Fraction of Design Space (FDS) statistics. Click on the Graphs tab to find the FDS graph. More information about FDS is available in the Help. Be sure that the model you selected contains only terms you expect to be significant (Table 9).

12.2 Leverage

The leverage data as shown in the table above is the potential for a design point to influence the fit of the model coefficients, based on its position in the design space. Leverages approaching or at 1 indicate that point will influence the model. A leverage of 1 means the model must exactly fit the observed value. A good design avoids leverages approaching 1. A design for the same model but having more runs will tend to have a lower leverage for each point.

Watch for leverages close to 1.0. Consider replicating these points or make sure they are run very carefully.

The Graphs tab contains the FDS, Perturbation, interactions, Contour, Cube, and 3D Surface Plots to help understand the data and the model parameters.

12.3 FDS graph

The FDS graph is used to compute the volume of the design space that has predicted variance less than or equal to the specified value. The fraction of the design space is calculated as this volume divided by the entire volume of the design space. The goal is to make a single plot that shows the cumulative fraction of the design space on the x-axis (from zero to one) versus the prediction variance on the y-axis.

For exploration and optimization, we advise an FDS score of at least 0.8, or 80%, and for stability and robustness testing, such as showcasing the design space for quality by design (QbD) work, 100%. Options for assessing the FDS in relationship to four different error categories are provided by the FDS Graph tool, i.e. Mean, prediction intervals, Difference between pairs of Observations, and Tolerance. We are using the Mean error type since the aim of this experiment is to find the optimized factor settings for specific response goals. Figure 9 below is the visualization for the FDS graph.

Figure 9.

FDS graph.

There are three parameters: delta, sigma, and alpha for each type of error and a fourth parameter is Proportion for the Tolerance type of error.

delta specifies the maximum acceptable half-width (margin of error) of the respective interval for the Mean, Pred, and Tolerance error types. One best way to find the delta is to answer the question, “plus or minus how much is an acceptable estimate?”

sigma is an estimate for the standard deviation that will appear on the ANOVA. It can be obtained from previous work with this system, work from a similar system, or outright guessing. A smaller sigma can be entered to enhance the FDS if the unexplained nuisance fluctuation can be reduced during the experiment.

Alpha is the used significance level throughout the statistical analysis. Our default is 0.05 or 5%. It is a type I error acceptable risk. FDS rises as alpha increases. The critical value is calculated using alpha/2 for two-sided intervals and alpha for one-sided intervals.

Proportion is only used for the Tolerance type of error. It is the percentage of the individual outcomes that must fall within the tolerance range. Building a larger design and raising the delta will boost the FDS score, reducing the sigma, increasing the alpha, and/or decreasing the Proportion [11, 12].

12.4 Interaction

When the reaction varies depending on the settings of two elements, there has been an interaction. They will display two non-parallel lines, showing that one element has an impact on them and depends on the level of the other. Figure 10 displays the standard error of the design with interactions of the model parameters.

Figure 10.

Std error of the design with interactions of the model parameters.

13. Analysis

In the Analysis Section in design expert, select no transform in the configure tab and start the analysis using the button at the button. The interface should appear like Figure 11 below.

Figure 11.

Starting the analysis.

You can take advantage of the advanced options button to customize the model like changing from coded to actual factors for factors coding (It is not recommended though).

13.1 Fit summary

The regression calculations to fit all of the polynomial models to the chosen answer are started when the Fit Summary button is clicked. All model terms’ effects are calculated by the program. It produces statistics such as p-values, lack of fit, and R-squared values for comparing the models. The fit summary output is shown on screen in a report which can also be printed and/or copied to another application detected, The “Suggested” model will be highlighted and noted by the program. On the Model panel, this is set as the default model. We Look for the following (Table 10):

  • A high-order model explains significantly more of the variation that is in the response (p-value small).

  • Insignificant lack of fit (p-value >0.10).

  • Adjusted R-squared and predicted R-squared have a reasonable level of agreement (within 0.2 of each other).

TermStandard Error*VIFRi2Power
A0.270610.000091.4%
B0.270610.000091.4%
C0.270610.000091.4%
AB0.353610.000072.2%
AC0.353610.000072.2.%
BC0.353610.000072.2%
A20.26341.018270.017999.9%
B20.26341.018270.017999.9%
C20.26341.018270.017999.9%

Table 8.

Model parameters.

RunLeverageSpace Type
10.6073Axial
20.6698Factorial
30.6073Axial
40.6698Factorial
50.1663Center
60.6073Axial
70.1663Center
80.1663Center
90.6698Factorial
100.6698Factorial
110.6698Factorial
120.6693Factorial
130.1663Center
140.1663Center
150.6073Axial
160.6698Factorial
170.6698Factorial
180.1663Center
190.6073Axial
200.6073Axial
Average0.5000

Table 9.

Leverage.

Response 1: Biodiesel Yield
SourceSequential p-valueLack of Fit p-valueAdjusted R2Predicted R2
Linear0.49740.0082−0.0302−0.3670
2FI0.77710.0060−0.1914−1.6146
Quadratic0.02790.01570.4440−0.9458Suggested
Cubic0.06350.03540.8292−6.2531Aliased

Table 10.

Fit summary.

Note: Aliased Models should entirely be avoided.

13.2 Sequential model sum of squares

Table 11 shows the sum of squares, degree of freedom, mean square, F-value, and p-value of the design model. The Sequential Model Sum of squares is the sum of the squared deviations from the mean for each model. The SS for the Mean is calculated first, followed by the Blocks (if applicable), Linear model, Quadratic model, Special Cubic, Cubic, Residuals, and Total.

SourceSum of SquaresdfMean SquareF-valuep-value
Mean vs. Total68182.63168182.63
Linear vs. Mean1421.303473.770.83370.4974
2FI vs. Linear726.963242.320.36870.7771
Quadratic vs 2FI4775.3131591.775.190.0279Suggested
Cubic vs. Quadratic2076.834519.215.510.0635Aliased
Residual376.84494.21
Total77559.86184308.88

Table 11.

Modeling sequentially, sum of squares.

For each source, the sum of squares divided by the degrees of freedom yields the mean square. This is used to compute the F-value for the models.

The F-value is used to test the significance of adding new model terms to those terms already in the model. For instance, the meaning of the linear terms remains tested after removing the effect of the average and the blocks. Then, the significance of the quadratic terms is tested after removing the average, block, and linear effects. And so on. Select the polynomial with the highest order and where the additional terms are significant and the model is not aliased.

13.3 Model summary statistics

R-squared is the correlation coefficient for the model. It should be close to one. We recommend using the Adjusted R-squared for DOE evaluation.

The amount of variation that can be explained by the model is shown by the adjusted R-squared. This is the R-squared value after adjusting for how many terms are in the model relative to the number of design points. The Model summary statistics is shown in Table 12.

SourceStd. Dev.R2Adjusted R2Predicted R2PRESS
Linear23.840.1516−0.0302−0.367012818.28
2FI25.640.2291−0.1914−1.614624517.49
Quadratic17.510.73830.4440−0.945818246.34Suggested
Cubic9.710.95980.8292−6.253168014.21Aliased

Table 12.

Model summary statistics.

Predicted R-Squared is calculated from the PRESS statistic, this represents the amount of variation in new data explained by the model. A negative Predicted R-squared means that the overall mean is a better predictor than this model.

Focus on the model maximizing the Adjusted R2 and the Predicted R2.

13.4 Lack of fit tests

The data for the Lack of fit Test is shown above in Table 13. This is the p-value associated with the Lack of Fit calculation for this model. The best model should have an insignificant p-value. A typical cutoff would be a p-value >0.10 to conclude an insignificant lack of fit.

SourceSum of SquaresdfMean SquareF-valuep-value
Linear7886.7811716.9831.110.0082
2FI7159.838894.9838.830.0060
Quadratic2384.525476.9020.690.0157Suggested
Cubic307.681307.6813.350.0354Aliased
Pure Error69.15323.05

Table 13.

Lack of fit tests.

The selected model should have an insignificant lack of fit.

13.5 ANOVA for quadratic model

Table 14 is the Anova data which is used to test for the significance of the result obtained. Model Probability (a.k.a. p-value) is the probability that the model F statistic is at least the computed value even though the truth is there are no factor effects (the data produced false effects). Probabilities less than the acceptable risk (alpha, by default 0.05) are deemed significant and indicate that there is a model effect. Values greater than the alpha risk suggest no significant effect.

SourceSum of SquaresdfMean SquareF-valuep-value
Model6923.579769.292.510.1049not significant
  A-Temperature1218.7411218.743.970.0813
  B-Methanol-Oil ratio140.271140.270.45730.5179
  C-Catalyst Weight62.29162.290.20310.6642
  AB561.121561.121.830.2132
  AC165.071165.070.53820.4841
  BC0.756410.75640.00250.9616
  A2836.531836.532.730.1372
  B24358.2714358.2714.210.0055
  C2859.231859.232.800.1327
Residual2453.678306.71
  Lack of Fit2384.525476.9020.690.0157significant
  Pure Error69.15323.05
Cor Total9377.2417

Table 14.

Anova for quadratic model.

The degree to which the model fits the data is measured by lack of fit. A strong lack of fit (p < .05) is an undesirable property because it shows that the model does not fit the data well. It is desirable to have little lack of fit (P > 0.1).

The model is not significant in comparison to the noise, according to the model’s F-value of 2.51. The likelihood of noise causing an F-value this large is 10.49%.

Model terms are considered significant when the P-value is less than 0.0500. B2 is a significant model term in this situation. The model terms are not significant if the value is greater than 0.1000. Model reduction may enhance the model if it has a lot of unnecessary terms (except those needed to maintain hierarchy).

The significance of the lack of fit is indicated by the lack of fit F-value of 20.69. A significant Lack of Fit F-value can only be caused by noise in 1.57 percent of cases. A significant lack of fit is undesirable, we want the model to fit.

A negative Predicted R2 as shown in the Fit Statistics data in Table 15 implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model might be more accurate.

Fit Statistics
Std. Dev.17.51R20.7383
Mean61.55Adjusted R20.4440
C.V. %28.46Predicted R2−0.9458
Adeq Precision4.8223

Table 15.

Fit statistics.

Adeq Precision: The ratio of signal to noise is measured by Adeq Precision. A ratio of at least 4 is preferred. Your ratio of 4.822 shows an adequate signal. This model can be used to navigate the design space.

14. Decision

From the ANOVA result, it is obvious the model cannot be deployed like this, we need to tweak it a bit before using it for optimization, or else the solutions provided by it will be misleading. Now we will remove the interaction terms from the model since they have lower power (see Table 8). We will only repeat the ANOVA section after this change (Table 16).

14.1 ANOVA for reduced quadratic model

The model is significant, as indicated by the model’s F-value of 3.57 The likelihood of noise producing an F-value this large is only 3.25.

Model terms are considered significant when the P-value is less than 0.0500. In this case, B2 is a crucial model term in this instance. Model terms are not significant if the value is higher than 0.100. Model reduction may enhance your model if it has a large number of unnecessary terms (excluding those necessary to maintain hierarchy).

The Lack of Fit F-value of 16.87 implies the Lack of Fit is significant. There is only a 2.02% chance that a Lack of Fit F-value this large could occur due to noise. A significant lack of fit is not okay -- we want the model to fit. But there is a little bit more improvement than before. So we can work with this model (Table 17).

SourceSum of SquaresdfMean SquareF-valuep-value
Model6196.6161032.773.570.0325significant
  A-Temperature1218.7411218.744.210.0646
  B-Methanol-Oil ratio140.271140.270.48510.5006
  C-Catalyst Weight62.29162.290.21540.6516
  A2836.531836.532.890.1170
  B24358.2714358.2715.070.0026
  C2859.231859.232.970.1127
Residual3180.6211289.15
  Lack of Fit3111.478388.9316.870.0202significant
  Pure Error69.15323.05
Cor Total9377.2417

Table 16.

ANOVA for reduced quadratic model.

Std. Dev.17.00R20.6608
Mean61.55Adjusted R20.4758
C.V. %Predicted R2−0.3457
Adeq Precision5.4594

Table 17.

Fit statistics for RQM.

14.2 Fit statistics for RQM

A negative Predicted R2 implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model may also predict better.

Adeq Precision measures the signal-to-noise ratio. A ratio greater than 4 is desirable. Our ratio of 5.459 indicates an adequate signal. This model can be used to navigate the design space. And there is an improvement in the Adjusted R2 using this reduced Quadratic Model.

14.3 Coefficients in terms of coded factors

The coefficient estimate data in Table 18 represents the estimated coefficient and shows the anticipated change in response for each unit change in the factor value. The intercept in an orthogonal design is the overall average response of all the runs. The coefficients are adjustments around that average based on the factor settings. The VIFs are 1 when the factors are orthogonal; Multi-collinearity is indicated by VIFs that are more than 1, and the higher the VIF, the more severe the correlation of components VIFs of fewer than 10 are generally acceptable.

FactorCoefficient EstimatedfStandard Error95% CI Low95% CI HighVIF
Intercept88.0518.4969.37106.74
A-Temperature−9.4514.60−19.570.68081.0000
B-Methanol-Oil ratio−3.2014.60−13.336.921.0000
C-Catalyst Weight2.1414.60−7.9912.261.0000
A2−8.1314.78−18.662.391.08
B2−18.5614.78−29.09−8.041.08
C2−8.2414.78−18.762.281.08

Table 18.

Coefficients as codified factors.

14.4 Final equation in terms of coded factors

You can apply the equation in terms of coded factors in Table 19 to make predictions about the response for given levels of each factor. By default, the factors’ high levels are coded as +1 and their low levels as −1. By comparing the factor coefficients, the coded equation can be used to determine the relative importance of the elements.

Biodiesel Yield=
+80.46
−9.45* A
−3.20* B
+2.13* C
−5.54* A2
−15.97* B2
−5.65* C2

Table 19.

Equation at the end using coded factors.

14.5 Final equation in using actual factors

The equation in terms of actual factors in Table 20 can be used to make predictions about the response for given levels of each factor. Here, the levels should be specified in the original units for each factor. The relative importance of each item should not be determined using this equation because the coefficients are scaled to accommodate the units of each factor and the intercept is not at the center of the design space.

Biodiesel Yield=
−902.92058
+26.92040* Temperature
+12.34881* Methanol-Oil ratio
+19.01074* Catalyst Weight
−0.221613* Temperature2
−0.283914* Methanol-Oil ratio2
−2.51265* Catalyst Weight2

Table 20.

Final equation using actual factors.

14.6 Diagnostics plots

Raw residuals and internally studentized options are also available, with externally studentized residuals being the default. The standard errors of the residuals are different unless all the runs in a design have the same leverage. Each raw residual represents a different population (one for each different standard error). As a result, it is not recommended to validate the regression assumptions using raw residuals. All of the individual normal distributions are mapped by studentizing the residuals to a single standard normal distribution. The default is externally studentized residuals based on a deletion procedure since they are more sensitive to detecting issues with the analysis. Internally Studentized residuals are also available but are less sensitive to finding such problems. As described in the diagnostics plot in Figure 12 from design expert software.

Figure 12.

A diagnostics plots.

Normal Probability: If the residuals follow a normal distribution, they should follow a straight line, according to the normal probability plot. Even with typical data, expect some scatter. Only focus on distinct patterns, such as an “S-shaped” curve, which suggests that a response modification might lead to a more accurate analysis.

Residuals vs. Predicted: This is a plot of the residuals versus the ascending predicted response values. The idea of constant variance is tested. The plot needs to be random scatter (residuals should have a constant range across the graph). This plot’s expanding variance (“megaphone pattern”) suggests that a transformation is required.

Predicted vs. Actual: An illustration showing a graph of expected and actual response values. The purpose is to detect a value, or group of values, that are not easily predicted by the model.

Leverage: A measurement of each point’s impact on the model’s fit. When a point’s leverage is 1, the model perfectly describes the observation at that location. The model is influenced by that point. A run with more than two times the typical leverage is generally regarded as having high leverage. There aren’t many runs like them in the factor space. The average leverage is calculated by dividing the number of terms among the model by the number of design runs.

14.7 Model graphs

All the model graphs which can be used to drive insights on the responses for all input data are shown in Figures 1318 respectively.

Figure 13.

All factors response.

Figure 14.

Interactions.

Figure 15.

Contour plot.

Figure 16.

Predicted vs. actual.

Figure 17.

3D surface plot.

Figure 18.

Cube plot.

15. Optimization

Here, Our goal is to maximize Biodiesel Yield using the given factors in the range (lower and upper level) summarized in Table 21 below.

15.1 Solutions

The design expert Software iterated over all the ranges of factors and found the maximum yield. There are 100 possible solutions. However, we will select the one suggested by the software and shown below in Table 22.

NameGoalLower LimitUpper LimitLower WeightUpper WeightImportance
A:Temperatureis in range6070113
B:Methanol-Oil ratiois in range1530113
C:Catalyst Weightis in range25113
Biodiesel Yieldmaximize10.6690.98115

Table 21.

Constraints.

100 Solutions found
NumberTemperatureMethanol-Oil ratioCatalyst WeightBiodiesel YieldDesirability
161.81822.1283.76091.0071.000Selected
262.52221.5683.70490.9871.000
362.22321.8033.66891.0641.000
461.71721.8823.70891.0251.000
562.12621.9733.75991.0521.000
662.02121.7463.77291.0451.000
762.29521.8263.80791.0131.000
862.02522.0873.70691.0521.000
962.18321.8703.75991.0551.000
1062.18822.0013.63291.0491.000

Table 22.

Optimization solutions.

16. Conclusion

In this Chapter, we have extensively applied Central Composite design to optimize Biodiesel Synthesis Using a Catalyst and design expert 13 has been used to provide deep statistical analysis. A reduced Quadratic model with a significant p-value of 0.0325 was accepted since the Quadratic model has an insignificant p-value. The model is significant, as indicated by the model’s F-value of 3.57. An F-value this large might be caused by noise only in 3.25% of cases. The number of the experimental run was reduced to 18 runs compared to the 20 runs used by the original experimenters and we have also obtained a higher yield of 91% compared to the 89% obtained in the original study.

Acknowledgments

I acknowledge my co-author, Dr. C.N Njoku for his help and support to garner this information and for inspiring the success of this work. I also want to return big regard to my mother for always providing her special support in the little ways she could.

References

  1. 1. Bhattacharya S. Central composite Design for Response Surface Methodology and its Application in pharmacy. In: Response Surface Methodology in Engineering Science. London, UK: IntechOpen; 2021. DOI: 10.5772/INTECHOPEN.95835
  2. 2. Wikipedia contributors. (2020). Central composite design. In Wikipedia, The Free Encyclopedia. Available from: https://en.wikipedia.org/w/index.php?title=Central_composite_design&oldid=954106283 [Accessed: May 14, 2022]
  3. 3. Skartland LK, Mjos SA, Grung B. Experimental designs for modeling retention patterns and separation efficiency in the analysis of fatty acid methyl esters by gas chromatography-mass spectrometry. Journal of Chromatography A. 2011;1218:6823-6831
  4. 4. Tshizanga N, Aransiola EF, Oyekola O. Optimization of biodiesel production from waste vegetable oil and eggshell ash. South African Journal of Chemical Engineering. 2017;23:145-156. DOI: 10.1016/j.sajce.2017.05.003
  5. 5. Manohar M, Joseph J, Selvaraj T, Sivakumar D. Application of box Behnken design to optimize the parameters for turning Inconel 718 using coated carbide tools. International Journal of Scientific and Engineering Research. 2013;4(620):642
  6. 6. Breyfogle FW. Chapter 17. In: Statistical Methods for Testing, Development, and Manufacturing. John Wiley & Sons Ltd, New York. 252 p; 1992
  7. 7. Singh B, Kumar R, Ahuja N. Optimizing drug delivery systems using systematic" design of experiments." part I: Fundamental aspects. Critical Reviews in Therapeutic Drug Carrier Systems. 2005;22(1):27-105
  8. 8. Cavazzuti M. Design of experiments. In: Optimization Methods. Berlin, Heidelberg: Springer; 2013. pp. 13-42
  9. 9. Hassanein HM, Abd-Rabou AS, Sakr SM. Design optimization of transverse flux linear motor for weight reduction and performance improvement using response surface methodology and genetic algorithms. IEEE Transactions on Energy Conservation. 2010;25(3):598-605
  10. 10. Anderson MJ, Whitcomb PJ. RSM Simplified. New York: Productivity, Inc.; 2016
  11. 11. DeGryze L, Vandebroek. Using the correct intervals for prediction: A tutorial on tolerance intervals of ordinary least-squares regression. Chemometrics and Intelligent Laboratory Systems. 2007;87(2):147-154
  12. 12. Zahran A, Anderson-Cook CM, Myers RH. Fraction of design space to assess prediction capability of response surface designs. Journal of Quality Technology. 2003;35(4):377-386

Written By

Chigoziri N. Njoku and Samuel K. Otisi

Submitted: 15 July 2022 Reviewed: 23 December 2022 Published: 23 January 2023