Open access peer-reviewed chapter

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

By Loc Nguyen, Truong-Duyet Phan and Thu-Hang T. Ho

Submitted: November 6th 2017Reviewed: February 5th 2018Published: August 1st 2018

DOI: 10.5772/intechopen.74883

Downloaded: 582


Fetal age and weight estimation plays an important role in pregnant treatments. There are many estimation formulas created by the combination of statistics and obstetrics. However, such formulas give optimal estimation if and only if they are applied into specified community. This research proposes a so-called Phoebe framework that supports physicians and scientists to find out most accurate formulas with regard to the community where scientists do their research. The built-in algorithm of Phoebe framework uses statistical regression technique for fetal age and weight estimation based on fetal ultrasound measures such as bi-parietal diameter, head circumference, abdominal circumference, fetal length, arm volume, and thigh volume. This algorithm is based on heuristic assumptions, which aim to produce good estimation formulas as fast as possible. From experimental results, the framework produces optimal formulas with high adequacy and accuracy. Moreover, the framework gives facilities to physicians and scientists for exploiting useful statistical information under pregnant data. Phoebe framework is a computer software available at


  • fetal age estimation
  • fetal weight estimation
  • ultrasound measures
  • regression model
  • estimation formula

1. Introduction

Fetal age and weight estimation is to predict the birth weight or birth age before delivery. It is very important for doctors to diagnose abnormal or diseased cases so that she/he can decide treatments on such cases. Because this research mentions both age estimation and weight estimation, for convenience, the term “birth estimation” implicates both of them. There are two methods for birth estimation:

  • Determining volume of fetal inside mother womb and then calculating fetal weight based on such volume and mass density of flesh and bone. By the other way, fetal age and weight can be estimated according to size of mother womb.

  • Applying statistical regression model: Fetal ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), fetal length (fl), arm volume (arm_vol), and thigh volume (thigh_vol) are recorded and considered as input sample for regression analysis which results in a regression function. This function is formula for estimating fetal age and weight according to ultrasound measures such as bpd, hc, ac, fl, arm_vol, and thigh_vol. Data that are composed of these ultrasound measures are called gestational sample or pregnant sample. Terms: “sample” and “data” have the same meaning in this research. Sample is representation of population where research takes place.

Because the second method reflects features of population from statistical data, the regression model is chosen for birth estimation in this research. Note, some terminologies such as function, regression function, estimation function, regression model, estimation model, formula, regression formula, and estimation formulahave the same meaning.

There are many estimation formulas resulted from gestational researches such as [1, 2, 3, 4, 5, 6, 7, 8, 9]. Some of them gain high accuracy, but they are only appropriate to population, community or ethnic group, where such researches are done. If we apply these formulas into other community such as Vietnam, they are no longer accurate. Moreover, it is difficult to find out a new and effective estimation formula or the cost of time and (computer) resources of formula discovery is expensive. Therefore, the first goal of this research is to propose an effective built-in algorithm, which produces highly accurate formulas that are easy to tune with specified population. The process of producing formulas by such algorithm is as fast as possible. In addition, physicians and researchers always want to discover useful statistical information from measure sample and regression model. Thus, the second goal of this research is to give facilities to physicians and researchers by introducing them a framework that is called Phoebe frameworkor Phoebe system. Phoebe framework implements such built-in algorithm in the first goal and provides a tool allowing physicians and researchers to exploit and take advantage of useful information under gestational sample. This tool is programmed as computer software. Moreover, Phoebe framework allows software developers to modify its modules. For example, developers can improve the built-in algorithm by adding heuristic constraints.

This chapter is the improved collection of our two articles “A framework of fetal age and weight estimation” [10] and “Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age” [11]. Section 2 gives an overview of the architecture of Phoebe framework. Section 3 is a description of the built-in algorithm to produce optimal formulas which are appropriated to a concrete population like Vietnam. Such algorithm is the core of Phoebe framework. Section 4 discusses main use cases of the framework with respect to gestational sample. As experimental results, some interesting estimation formulas produced by the framework are described in Section 5. A proposal of early weight estimation is proposed in Section 6. Conclusion is given in Section 7. Note that Phoebe framework used statistic software package “Java Scientific Library” of Michael Thomas Flanagan [12] and parsing package “A Java expression parser” of Jos de Jong [13]. The package “Java Scientific Library” is the most important one in the framework. The framework is implemented by Java language [14].


2. General architecture of Phoebe framework

Based on clinical data input which includes fetal ultrasound measures such as bpd, hc, ac, and fl, the framework produces optimal formulas for estimating fetal weight and fetal age with the highest precision. Moreover, statistical information about fetus and gestation is also described in detail with two forms: numerical format and graph format. Therefore, the framework consists of four components as follows:

  • Datasetcomponent is responsible for managing information about fetal ultrasound measures such as bpd, hc, ac, fland extra gestational information in reasonable and intelligent manner. This component allows other components to retrieve such information. Gestational information is organized into some abstract structure, for example, a matrix, where each row represents a sample of bpd, hc, ac, flmeasures. Table 1 is an example of this abstract structure.

  • Regressioncomponent represents estimation formula or regression function. This component reads ultrasound information from Datasetcomponent and builds up optimal estimation formula from such information. The built-in algorithm, which is used to discover and construct estimation formula, is discussed in Section 3. This component is the most important one because it implements such discovery algorithm.

  • Statistical Manifestcomponent describes statistical information of both ultrasound measures and regression function, for example, mean and standard deviation of bpdsamples, sum of residuals, correlation coefficient of regression function, and percentile graph of fetal weight. Statistical manifest is organized into two forms such as numerical format and graph format.

  • User Interface(UI) component is responsible for providing interaction between system and users such as physicians and researchers. A popular use case is that users enter ultrasound measures and require system to print out both optimal estimation formula and statistical information about such ultrasound measures; moreover, users can retrieve other information in Datasetcomponent. UIcomponent links to all of other components so as to give users as many facilities as possible.

bpdhcflacFetal age
Fetal weight

Table 1.

An example of gestational sample matrix.

Three components: Dataset, Regressionand Statistical Manifestare basic components. The fourth component User Interfaceis the bridge among them. Figure 1 shows a general architecture of Phoebe framework.

Figure 1.

General architecture of Phoebe framework.

3. Built-in algorithm of Phoebe framework

Phoebe framework uses a regression model for estimating fetal weight and age. Suppose a linear regression function Y = α0 + α1X1 + α2X2 + … + αnXnwhere Yis fetal weight or age, whereas Xi(s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Variable Yis called response variable or dependent variable. Each Xiis called regression variable, regressor, regression variable, or independent variable. Each αiis called regression coefficient. Given a set of measure values of Xi(s), the value of Ycalled Y-estimatedcalculated from this regression function is estimated fetal weight (or age) which is compared with real value of Ymeasured from ultrasonic machine. The real value of Ycalled Y-realis fetal weight (or age) available in sample. In this research, the notation Yrefers implicitly to Y-estimatedif there is no explanation. The deviation between Y-estimatedand Y-realis a criterion used to assess the quality or the precision of regression function. This deviation is also called estimation error. The less the deviation is, the better the regression function is. The goal of this research is to find out the optimal regression function or estimation formula whose precision is highest.

A regression function will be good if it meets two conditions as follows:

  • The correlation between Y-estimatedand Y-realis large.

  • The sum of residuals is small. Note that residual is defined as the square of deviation between Y-estimatedand Y-real. We have:


These two conditions are called the pair of optimal conditions. A regression function is optimal or best if it satisfies the pair of optimal conditions at most, where correlation between Y-estimatedand Y-realis largest, and the sum of residuals is smallest. Given a set of regression variables Xi(where i = 1, 2,…, n), we recognize that a regression function is a combination of kvariables Xi(s) where k ≤ nso that such combination achieves the pair of optimal conditions. Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, brute-force algorithm can be used to find out optimal function, which includes three following steps:

  1. Let indicator number kbe initialized 1, which responds to k-combination having kregression variables.

  2. All combinations of nvariables taken kare created. For each k-combination, the function built up by kvariables in this k-combination is evaluated on the pair of optimal conditions; if such function satisfies these conditions at most then, it is optimal function.

  3. Indicator kis increased by 1. If k = nthen algorithm stops, otherwise go back step 2.

The number of combinations which brute-force algorithm browses is:


where nis the number of regression variables and notation, and “k!” denotes factorial of k. If nis large enough, there are a huge number of combinations, which causes that the brute-force algorithm never terminates and it is impossible to find out the best function. Moreover, there are many kinds of regression function such as linear, quadric, cube, logarithm, exponent, and product. Therefore, we propose an algorithm which overcomes this drawback and always finds out the optimal function. In other words, the termination of the proposed algorithm is determined, and the time cost is decreased significantly because the searching space is reduced as small as possible. The proposed algorithm is called seed germination(SG) algorithm. SG is built-in algorithm of Phoebe framework, which is the core of Phoebe framework. It is heuristic algorithm, which is based on the pair of heuristic assumptionsas follows:

  • First assumption: regression variables Xi(s) trends to be mutually independent. It means that any pair of Xiand Xjwith ijin an optimal function are mutually independent. The independence is reduced into the looser condition “the correlation coefficient of any pair of Xi and Xj is less than a threshold δ.” This is minimumassumption.

  • Second assumption: each variable Xicontributes to quality of optimal function. The contribution rate of a variable Xiis defined as the correlation coefficient between such variable and Y-real. The higher the contribution rate is, the more important the respective variable is. Variables with high contribution rate are called contributivevariables. Therefore, optimal function includes only contributive regression variables. The second assumption is stated that “the correlation coefficient of any regression variable Xi and real response value Y-real is greater than a threshold ε.” This is the maximumassumption.

SG algorithm tries to find out a combination of regression variables Xi(s) so that such combination satisfies such pair of heuristic assumptions. In other words, it is expected that this combination can constitute an optimal regression function that satisfies the pair of heuristic conditions, as follows ([10] p. 22):

  • The correlation coefficient of any pair of Xiand Xjis less than the minimum threshold δ > 0. This condition is corresponding to the minimum assumption, which is called minimum conditionor independence condition.

  • The correlation coefficient of any Xiand Y-realis greater than the maximum threshold ε > 0. This condition is corresponding to the maximum assumption, which is called maximum conditionor contribution condition.

Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, let f = α0 + α1X1 + α2X2 + … + αkXk(k ≤ n) be the estimation function and let Re(f) = {X1, X2,…, Xk} be its regression variables. Note that the value of fis fetal age or fetal weight. Re(f) is considered as the representation of f. Let OPTIMALbe the output of SG algorithm, which is a set of optimal functions returned. OPTIMALis initialized as empty set. Let Re(OPTIMAL) be a set of regression variables contained in all optimal functions fOPTIMAL. SG algorithm has four following steps ([10] p. 22):

  1. Let Cbe the complement set of VARwith regard to OPTIMAL, we have C = VAR\Re(OPTIMAL) where the backslash “\” denotes complement operator in set theory. It means that Cis in VARbut not in Re(OPTIMAL).

  2. Let GCbe a list of regression variables satisfying the pair of heuristic conditions. Note, Gis subset of C. If Gis empty, the algorithm terminates; otherwise going to step 3.

  3. We iterate over Gin order to find out the candidate list of good functions. For each regression variable XG, let Lbe the union set of optimal regression variables and X. We have L = Re(f){X} where fOPTIMAL. Suppose CANDIDATEis a candidate list of good functions, which is initialized as empty set. Let gbe the new function created from L; in other words, regression variables of gbelong to L, Re(g) = L. If function gmeets the pair of heuristic conditions, it is added into CANDIDATE, CANDIDATE = CANDIDATE{g}.

  4. Let BESTbe a set of best functions taken from CANDIDATE. In other words, these functions belong to CANDIDATEand satisfy the pair of heuristic conditions at most, where correlation is the largest and the sum of residuals is the smallest. If BESTequals OPTIMAL, then the algorithm stops; otherwise assigning BESTto OPTIMALand going back step 1. Note that two sets are equal if their elements are the same.

Figure 2 shows the flow chart of SG algorithm.

Figure 2.

Flow chart of SG algorithm.

SG algorithm was described in article “A framework of fetal age and weight estimation” ([10] pp. 21–23). It is easy to recognize that the essence of SG algorithm is to reduce search space by choosing regression variables satisfying heuristic assumption as “seeds.” Optimal functions are composed of these seeds. The algorithm always delivers best functions but can lose other good functions. The length of function is defined as the number of its regression variables. Terminated condition is that no more optimal functions can be found out or possible variables are browsed exhaustedly. Therefore, the result function is the longest and best one, but some other shorter functions may be significantly good.

The current implementation of SG algorithm establishes that the minimum threshold δis arbitrary. It also supports nonlinear regression models shown in Table 2 as follows:


Table 2.

Nonlinear regression models.

The notations “exp” and “log” denote exponent function and natural logarithm function, respectively. Most of nonlinear regression models can be transformed into linear regression models. For example, given the product model, the following is an example of linear transformation.




The product model becomes the linear model with regard to variables U, Ziand coefficients βias follows:


Table 3 shows how to transform nonlinear models into linear models.

Polynomial transformationY=α0+α1X1+X2++Xnk
where Z1=X1+X2++Xnk
Logarithm transformationY=α0+α1logX1+α2logX2++αnlogXn
where Zi = log(Xi)
Logarithm transformationY=α0+α1logX1+X2++Xn
where Z1 = log(X1 + X2 + … + Xn)
Exponent transformationY=expα0+α1X1+α2X2++αnXn
where U = log(Y)
Exponent transformationY=expα0+α1X1+X2++Xn
where U = log(Y) and Z1 = X1 + X2 + … + Xn
Product transformationY=α0X1α1X2α2Xnαn
where U=logY,Zi=logXi,β0=logα0,βi1=αi

Table 3.

Transformation of nonlinear models into linear models.

With the built-in SG algorithm, Phoebe framework can be totally used for any regression application beyond birth estimation.

4. Use cases of Phoebe framework

Phoebe framework has three basic use cases realized by three components: dataset, regression model and statistical manifest as discussed in Section 2. Three basic use cases include:

  1. Discovering optimal formulas with high accuracy. Optimal formulas are results of SG algorithm described in Section 3.

  2. Providing statistical information under gestational sample. Statistical information is in numeric format and graph format.

  3. Comparison among different formulas.

Use case 1: Discovering optimal formulas

Given gestational data [15] are composed of two-dimensional ultrasound measures of pregnant women. These measures are taken at Vinh Long General Hospital – Vietnam, which include bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac) and fetal length (fl). Fetal age is from 28 to 42 weeks. Fetal weight is measured by gram. Gestational sample is shown in Figure 3.

Figure 3.

Gestational sample.

After specifying the maximum threshold ε(fitness value) and which measures are regression variables and response variable, user presses button “Estimate” to retrieve optimal formulas as results of SG algorithm. Such optimal formulas are shown in Figure 4. Note, in Figure 4, regression variables are bpd, hc, ac, and fl, whereas response variable is fetal weight. The threshold εis 0.6.

Figure 4.

Optimal weight estimation formulas.

An estimation formula with one or two regressors (ultrasound measures) can be represented as a graph. In the illustrative Figure 5, the horizontal axis indicates the measure bpdin millimeter, and the right vertical axis indicates the measure acin millimeter. The left vertical axis shows the estimated weight.

Figure 5.

Estimation graph for estimating fetal weight.

The graph in Figure 5 has 11 estimation lines represented as internal (red) lines. Each estimation line corresponds to a small interval of ac. Fetal weight on each estimation line ranges from 900 to 4800 g. This is a way to show a three-dimensional function as a two-dimensional graph. For example, given bpd = 90 and ac = 300, we need to estimate fetal weight. Because acis 300 mm, we look at the sixth estimation line from bottom to up. The intersection point between bpd = 90 and the sixth estimation line is projected on the left vertical axis, which results out a fetal weight that approximates to (4800–900)/2 + 900 ≈ 2850 g because such intersection point is near to midpoint of the weight range on the sixth estimation line.

Use case 2: Providing statistical information

Statistical information is classified into two groups: gestational information and estimation information.

  • Gestational information contains statistical attributes about fetal ultrasound measures, for example, mean, median and standard deviation of bpd.

  • Estimation information contains attributes about estimation model, for example, correlation coefficient, sum of residuals and estimation error of estimation formula.

In representation, statistical information is described in two forms: numeric format and graph format. Figure 6 shows statistical attributes (mean, median, standard deviation, histogram, etc.) of fetal age and ultrasound measures bpd, hc, ac, fl.

Figure 6.

Gestational statistical information.

Figure 7 shows a full description of a weight estimation formula: weight = 0.000043 * (bpd^1.948640) * (hc^0.263745) * (fl^0.601972) * (ac^0.905524). For instance, sum of residuals (SS) is 46412446.0047 and estimation error is −7.4655 ± 212.5571. Note, the sign “^” denotes exponent function, for example, 2^3 = 8.

Figure 7.

Statistical estimation information.

Use case 3: Comparison among different formulas

There are many criteria to evaluate efficiency and accuracy of estimation formulas. These criteria are called evaluation criteria, for example, correlation coefficient, sum of residuals, estimation error. Each formula has individual strong points and drawbacks. A formula is better than another one in terms of some criteria but may be worse than this other one in terms of different criteria. An optimal formula is the one that has more strong points than drawbacks in most criteria. Hence, Phoebe framework supports the comparison among different formulas via evaluation matrixrepresented in Figure 8. Each row in evaluation matrix represents a formula, whereas each column indicates a criterion. For example, first row, second row and third row represent three formulas in form of logarithm function, exponent function and linear function, respectively. Four criteria such as multivariate correlation, estimation correlation, error range and ratio error range are arranged in three respective columns.

Figure 8.

Comparison among different formulas.

Tables 48 in the section “experimental results” are numeric interpretations of evaluation matrix in Figure 8.

FormulaExpressionRError range
NH 1log(age) = 2.419638 + 0.002012 * bpd + 0.000934 * hc + 0.00547 * fl + 0.001042 * ac0.9303−0.0292 ± 1.4500
NH 2age = −3.364759 + 0.056285 * bpd + 0.034697 * hc + 0.188156 * fl + 0.035304 * ac0.92850 ± 1.4682
Ho 1age = 331.022308–1.611774 * (hc + ac) + 0.00278 * ((hc + ac)^2) - 0.000002 * ((hc + ac)^3)0.92120 ± 1.5384
Varol 6age = 11.769 + 1.275 * fl/10 + 0.449 * ((fl/10)^2) - 0.02 * ((fl/10)^3)0.8949−1.6807 ± 1.8525
Varol 1age = 5.596 + 0.941 * ac/100.8941−0.5683 ± 1.7711
Varol 5age = 1.863 + 6.280 * fl/10–0.211 * ((fl/10)^2)0.8934−1.5182 ± 2.1150

Table 4.

Comparison of age estimation with 2D sample.

The sign “^” denotes exponent operator. The template of formulas aims to flexibility, which can be input of any computational tool. Table 5 shows a comparison between our best weight formula and the others with 2D sample. As seen in Table 5, our formula is the best with R = 0.9636 and error range − 7.4656 ± 212.5573 g.

FormulaExpressionRError range
NH 3log(weight) = −10.047381 + 1.94864 * log(bpd) + 0.263745 * log(hc) + 0.601972 * log(fl) + 0.905524 * log(ac)0.9636−7.4656 ± 212.5573
NH 4log(weight) = 3.957543 + 0.02373 * bpd + 0.000802 * hc + 0.009403 * fl + 0.003157 * ac0.9635−6.0901 ± 214.1153
Sherpardweight = 10^(1.2508 + 0.166 * bpd/10 + 0.046 * ac/10–0.002646 * ac* bpd/100)0.9619−65.8121 ± 219.0392
Ho 2weight = 10^(1.746 + 0.0124 * bpd + 0.001906 * ac)0.9602−11.5576 ± 223.5124
Hadlockweight = 10^(1.304 + 0.05281 * ac/10 + 0.1938 * fl/10–0.004 * ac* fl/100)0.9395−76.4960 ± 272.9474
Campbell and Wilkinweight = 1000 * exp.(−4.564 + 0.282 * ac/10–0.00331 * ac* ac/100)0.921568.1261 ± 308.5728

Table 5.

Comparison of weight estimation with 2D sample.

FormulaExpressionRError range
NH 5age = 20.759763 + 0.170859 * (thigh_vol + arm_vol) - 0.000545 * ((thigh_vol + arm_vol)^2) + 0.000001 * ((thigh_vol + arm_vol)^3)0.99700 ± 0.2696
NH 6age = 21.816252 + 0.137531 * (thigh_vol + arm_vol) - 0.000228 * ((thigh_vol + arm_vol)^2)0.99690 ± 0.2752
Ho 3age = 21.1148 + 0.2381 * thigh_vol- 0.001 * (thigh_vol^2) + 0.000002 * (thigh_vol^3)0.9960−0.0150 ± 0.3173
Ho 4age = 167.079079–1.553705 * ac + 0.005559 * (ac^2) - 0.000006 * (ac^3)0.84820.3723 ± 1.8985

Table 6.

Comparison of age estimation with 3D sample.

FormulaExpressionRError range
NH 7weight = −3617.936175 + 0.513171 * hc + 1.960176 * ac + 39.804645 * bpd + 17.016936 * fl + 8.366404 * thigh_vol + 5.828808 * arm_vol0.9708−0.0001 ± 180.9803
NH 8weight = −3626.314419 + 43.426744 * bpd + 23.645338 * fl + 11.414273 * thigh_vol0.96980 ± 184.0439
Ho 5weight = −3306 + 55.477 * bpd + 13.483 * thigh_vol0.9663−0.0072 ± 194.0956
Lee 3weight = exp.(0.5046 + 1.9665 * log(bpd/10) - 0.3040 * (log(bpd/10)^2) + 0.9675 * log(ac/10) + 0.3557 * log(arm_vol))0.9620247.8761 ± 206.1607
Lee 5weight = exp.(2.1264 + 1.1461 * log(ac/10) + 0.4314 * log(thigh_vol))0.9514289.2660 ± 234.0763
Lee 2weight = exp.(−3.6138 + 4.6761 * log(ac/10) - 0.4959 * (log(ac/10)^2) + 0.3795 * log(arm_vol))0.9472316.4974 ± 242.7964
Ho 6weight = −882.7049 + 73.9955 * thigh_vol- 0.497 * (thigh_vol^2) + 0.0014 * (thigh_vol^3)0.9385−7.5001 ± 260.4596
Lee 4weight = exp.(4.7806 + 0.7596 * log(thigh_vol))0.9298737.4932 ± 344.1904
Lee 1weight = exp.(4.9588 + 1.0721 * log(arm_vol) - 0.0526 * (log(arm_vol)^2))0.9281867.0836 ± 309.5779
Changweight = 1080.8735 + 22.44701 * thigh_vol0.9229456.5168 ± 298.2517

Table 7.

Comparison of weight estimation with 3D sample.

FormulaExpressionRError range
(2D sample)
log10(weight) = −3.715073 + 1.873457 * log10(bpd) + 0.363783 * log10(fl) + 0.691683 * log10(ac) + 0.722245 * log10(age)0.9674−5.6422 ± 202.0395
(2D sample)
log10(weight) = −3.761798 + 2.001731 * log10(bpd) + 0.811078 * log10(ac) + 0.826279 * log10(age)0.9667−5.6111 ± 204.1477
(3D sample)
weight = −4988.000528 + 66.374156 * age + 0.370084 * hc + 1.943247 * ac + 39.464816 * bpd + 13.215505 * fl + 3.658463 * thigh_vol0.97150 ± 178.8091
(3D sample)
weight = −4982.099978 + 68.089354 * age + 2.001675 * ac + 39.85375 * bpd + 13.229377 * fl + 3.619405 * thigh_vol0.97140 ± 178.9114
NH 3
(2D sample)
log(weight) = −10.047381 + 1.94864 * log(bpd) + 0.263745 * log(hc) + 0.601972 * log(fl) + 0.905524 * log(ac)0.9636−7.4656 ± 212.5573
NH 4
(2D sample)
log(weight) = 3.957543 + 0.02373 * bpd + 0.000802 * hc + 0.009403 * fl + 0.003157 * ac0.9635−6.0901 ± 214.1153
NH 7
(3D sample)
weight = −3617.936175 + 0.513171 * hc + 1.960176 * ac + 39.804645 * bpd + 17.016936 * fl + 8.366404 * thigh_vol + 5.828808 * arm_vol0.9708−0.0001 ± 180.9803
NH 8
(3D sample)
weight = −3626.314419 + 43.426744 * bpd + 23.645338 * fl + 11.414273 * thigh_vol0.96980 ± 184.0439

Table 8.

Weight estimation dual formulas.

5. Experimental results

We make experiments based on Phoebe framework in order to find out optimal formulas for estimating fetus weight and age with note that such formulas are most appropriate to our gestational samples. We use two samples in which the first sample includes two-dimensional (2D) ultrasound measures of 1027 cases and the second sample includes three-dimensional (3D) ultrasound measures of 506 cases. Ho and Phan [15, 16] collected these samples of pregnant women at Vinh Long General Hospital, Vietnam, with obeying strictly all medical ethical criteria. These women and their husbands are Vietnamese. Their periods are regular, and their last periods are determined. Each of them has only one alive fetus. Fetal age is from 28 to 42 weeks. Delivery time is not over 48 h since ultrasound scan. Measures in 2D sample are bpd, hc, ac, and fl. Measures in 3D sample are bpd, hc, ac, fl, thigh_vol, arm_vol. The unit of bpd, hc, ac, flis millimeter. The unit of thigh_voland arm_volis cm3. The units of fetal age and fetal weight are week and gram, respectively. Experimental results mentioned in this section were also published in our article “Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age” [11].

The proposed framework can produce amazing formulas. We compare our optimal formulas with the others according to metrics such as estimation correlation and estimation error range, given such two gestational samples. Let Y = {y1, y2,…, yn} and Z = {z1, z2,…, zn} be fetal sample age/weight and fetal estimated age/weight, respectively. The estimation correlation denoted Ris correlation coefficient of sample response value and estimated response value, according to Eq. (1). The correlation Rreflects adequacy of a given formula. The larger the Ris, the better the formula is:


An estimation error denoted diis deviation between ziand yi. The estimation error mean denoted μis mean of errors. The error mean μreflects accuracy of a given formula. The smaller the absolute value of μis, the more accurate the formula is. If μis positive, the respective formula leans to overestimation. If μis negative, the respective formula leans to low estimation. The standard deviation σof estimation errors reflects the stability of a given formula. The smaller the standard deviation σis, the more stable the formula is. The combination of error mean μand standard deviation σresults out a so-called error range. Eq. (2) explains how to calculate μ, σ, and error range.


For example, if μ = −0.0292 and σ = 1.45 then, the error range is −0.0292 ± 1.45, which means that the total average error ranges from −1.4792 = −0.0292-1.45 to 1.4208 = −0.0292 + 1.45. The error range reflects both adequacy and accuracy of a given formula.

Table 4 shows a comparison between our best age formula and the others with 2D sample. As a convention, the name of each formula is the name of respective author listed in references section. For example, formula “Ho 1” is the first formula of the author Ho [4]. As seen in Table 4, our formula is the best with R = 0.9303 and error range − 0.0292 ± 1.4500 week (s). As a convention, our formulas have names with prefix “NH”

Table 6 shows comparison between our best age formula and the others with 3D sample. As seen in Table 6, our formula is the best with R = 0.9970 and error range ± 0.2696 week

Table 7 shows a comparison between our best weight formula and the others with 3D sample. As seen in Table 7, our formula is the best with R = 0.9708 and error range − 0.0001 ± 180.9803 g

Within the context of this research, from section of 3D ultrasound in PhD dissertation of Ho [4], I recognize that fetus weight and fetus age are mutually dependent. For instance, when fetus age increases, fetus weight increases too. As a result, weight estimation is improved significantly if fetus age was known before. If fetus age is added into the regression model of fetus weight as a regression variable (regressor), the resulted weight estimation formula, called dual formula, is even better than the most optimal ones shown in Tables 5 and 6. Such dual formula is not only precise but also practical because many pregnant women knew their gestational age before taking an ultrasound examination. Given 2D sample and 3D sample, Table 8 shows dual formulas in comparison with the most optimal ones shown in Tables 5 and 7 with regard to Rand error range. As a convention, our dual formulas have names with prefix “NHD”. Notation “log10” denotes logarithm function with base 10.

In Table 8, all dual formulas NHD * are better than normal formulas NH * with regard to Rand error range. Moreover, NHD * do not need too much regressors. Given 2D sample, NHD 1 and NHD 2 use 4 and 3 regressors including age regressor, respectively whereas both NH 3 and NH 4 uses 4 regressors. Given 3D sample, NHD 3 and NHD 4 use 6 and 5 regressors including age regressor, respectively, whereas NH 7 and NH 8 use 5 and 3 regressors, respectively.

Although our formulas are better than all remaining ones with high adequacy (large R) and high accuracy (small error range), other researches are always significant because their formulas are very simple and practical. Moreover, our formulas are not global. If they are applied into other samples collected in other communities, their accuracy may be decreased and they may not be still better than traditional formulas such as Sherpard and Hadlock. However, it is easy to draw from our experimental results that if Phoebe framework is used for the same samples with other researches, it will always produce preeminent formulas. In order to achieve global optimality with Phoebe framework, the following are two essential suggestions:

  • Experimenting on Phoebe framework with many samples.

  • Adding more knowledge of pregnancy study, ultrasound technique, and obstetrics into Phoebe framework. In other words, the additional knowledge will be modeled as constraints of SG algorithm.

These suggestions go beyond this research. In my opinion, we cannot reach absolutely the global optimality because Phoebe framework focuses on local optimality with specific communities. Essentially, the suggestions only alleviate the weak point of the built-in SG algorithm in global optimality.


6. A proposal of early weight estimation

The used ultrasound samples are collected in fetal age from 28 to 42 weeks because delivery time is not over 48 h since last ultrasound scan. Hence, accuracy of weight estimation is only ensured when ultrasound examinations are performed after 28-week old fetal age. This section proposes an early weight estimation, in which ultrasound measures can be taken before 28-week old fetal age. We do not ensure improvement of estimation accuracy yet because we do not make experiments on the proposal yet, but the gestational sample can be totally collected at any appropriate time points in gestational period. In other words, the sample can lack fetal weights. This is a convenience for practitioners because they do not need to concern fetal weights when taking ultrasound examinations. Consequently, early weight estimation is achieved. As a convention, vectors are column vectors if there is no additional information.

Without loss of generality, regression models are linear such as Y = α0 + α1X1 + α2X2 + … + αnXnand Z = β0 + β1X1 + β2X2 + … + βnXnwhere Yis fetal age and Zis fetal weight, whereas Xi(s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Suppose both Yand Zconform normal distribution, according to Eq. (3) ([17] pp. 8–9).


where α = (α0, α1,…, αn)Tand β = (β0, β1,…, βn)Tare parameter vectors where X = (1, X1, X2,…, Xn)Tis data vector. The means of Yand Zare αTXand βTX, respectively, whereas the variances of Yand Zare σ12and σ22, respectively. Note that the superscript “T” denotes transposition operator in vector and matrix. Let D = (X, y, z) be collected sample in which Xis a set of sample measures, yis a set of sample fetal ages, and zis a set of fetal weights with note that zis missed (empty) or incomplete. If zis empty, there is no ziin z. If zis incomplete, zhas some values but there are also some missing values in z. However, the constraint is that ymust be complete, which means that all pregnant women within the research knew their gestational age. Now we focus on estimate αand βbased on D. As a convention, let α*and β*be estimates of αand β, respectively ([17] p. 8).


Given X, joint probability of Yand Zis product of the probability of Ygiven Xand the probability of Zgiven Xbecause Yand Zare conditionally independent given X, according to Eq. (4).


Conditional expectation of sufficient statistic Zgiven Xwith regard to P(Z| X, β) is specified by Eq. (5).


When Zis hidden variable, there is a latent dependent relationship between Yand Z, which is specified by joint probability of Yand Z.


Variables Yand Zhave different measures. For instance, the unit of Yis week, whereas the unit of Zis gram. Suppose Yis considered as discrete variable whose values from 1 to Kwhere Kcan be up to 42, for example. The P(Y) becomes parameter θY, which is the probability of Ywhere Yis from 1 to K.


For each Z, suppose the condition probability P(Z| Y) is distributed normally with mean μYand variance σY2. Eq. (6) specifies the joint probability P(Y, Z).


Conditional expectation of sufficient statistic Zgiven Ywith regard to P(Z| Y, μY, σY2) is specified by Eq. (7).


Please pay attention to Eq. (7) because Zwill be estimated by such expectation later. Eq. (8) specifies expectation of sufficient statistic Zwith regard to P(Y, Z| θY, μY, σY2).


Due to:


The full joint probability of Yand Zgiven Xand parameters α, β, θY, μY, and σY2 is the product specified by Eq. (9).


where P(Y, Z| X, α, β) and P(Y, Z| θY, μY, σY2) are specified by Eqs. (4) and (6), respectively. Eq. (9) indicates that both explicit dependence via P(Y, Z| X, α, β) and implicit dependence via P(Y, Z| θY, μY, σY2) between Yand Z. Explicit dependence and implicit dependence share equal influence on Zif E(Z| X) specified by Eq. (5) is equal to E(Z) specified by Eq. (8), according to Eq. (10).


Given sample D, all θYbecome constants and determined by Eq. (11).

θY=The number ofyi=YNE11

For convenience, let Θ = (α, β, μY)Tbe the compound parameter. The full joint probability specified by Eq. (9) is rewritten as follows:


(Due to all observations are independently and identically distributed)




It is conventional that if δ(yi, Y) = 0 then, the respective probability P(yi, zi| μY, σY2) is removed from the product. The log-likelihood function is logarithm of the full joint probability as follows:


When log(2π) and θYare constants, the reduced log-likelihood function is derived from the log-likelihood as seen in Eq. (12).


The optimal estimate Θ*is a maximizer of l(Θ), according to Eq. (13) ([17] p. 9).


By taking first-order partial derivatives of l(Θ) with regard to Θ ([18] p. 34), we obtain:


When first-order partial derivatives of l(Θ) are equal to zero, it gets locally maximal. In other words, Θ* is solution of the equation system 14 resulted from setting such derivatives to be zero and setting E(Z| X) = E(Z).




The notation 0 = (0, 0,…, 0)Tdenotes zero vector. All equations in system 14 are linear, whose unknowns are Θ = (α, β, μY)T. The last equation in system 14 is Eq. (10) with the heuristic assumption that explicit dependence and implicit dependence share equal influence on Z. Such last equation is only used to adjust μY(s) if the heuristic assumption is concerned; otherwise it is ignored.

We apply expectation maximization (EM) algorithm into estimating Θ = (α, β, μY)Twith lack of fetal weights. Note that the full joint probability P(Y, Z| X, α, β, μY) specified by Eq. (9) is product of regular exponential distributions. EM algorithm has many iterations, and each iteration has expectation step (E-step) and maximization step (M-step) for estimating parameters. Given current parameter Θt = (αt, βt, μYt)Tat the tth iteration, the two steps are shown in Table 9 ([19] p. 4).

  1. E-step: Estimating only missing values zi(s) as the expectation of themselves based on the current mean μyit, according to Eq. (7). Note, each missing value ziis always associated with an observation yi. zi=Eziyi=μyit

  2. M-step: The next parameter Θt + 1 is a maximizer of l(Θ), which is the solution of equation system 14. Note, Θt + 1 becomes current parameter for the next iteration.

Table 9.

E-step and M-step of EM algorithm.

The equation system 14 is solvable because missing values zi(s) were estimated in E-step. The EM algorithm stops if at some tth iteration, we have Θt = Θt + 1 = Θ*. At that time, Θ* = (α*, β*, μY*)Tis the optimal estimate of EM algorithm, and hence, linear regression functions of Yand Zare determined with α*, β*.

As usual, all parameters are changed after every iteration of EM algorithm, but fortunately, α*is determined as a partial solution of equation system 14 at the first iteration of EM process because both Xand yare complete. In other words, α*is fixed, whereas βand μYare changed in EM process. Eq. (15) ([20] p. 417) specifies α*.


where the superscript “−1” denotes the inversion of matrix.

At the first iteration, as usual Θ1 is initialized arbitrarily but we can improve convergence of EM algorithm by initializing μY1as sample mean. Without loss of generality, suppose practitioners obtained n < Nfetal weights z1, z2,…, znfrom nultrasound scans. Moreover, the fetal age of all pregnant women over such nscans is the same, which is Y. Thus, μY1is initialized by Eq. (16).


The parameter β1 at the first iteration is initialized according to previous studies in the literature.

7. Conclusions

According to experimental results, there is no doubt that Phoebe framework produces optimal formulas with high adequacy and accuracy; please see Tables 48 for more details. However, we also recognize the weak point of our research is that the built-in SG algorithm can lose some good formulas due to the heuristic conditions. The suggestive solution is to add more constraints in such conditions; please read the article “A framework of fetal age and weight estimation” ([10] pp. 24–25) for more details. The proposal of early weight estimation uses actually an additional constraint which is the latent relationship between fetal age and fetal weight. Such latent relationship represented by the joint probability of fetal age and weight is a knowledge aspect of pregnancy study. For further research, we will make experiment on the proposal and try our best to discover other knowledge aspects.

Another weak point of our research is difficult to apply our complex formulas for fast mental calculation because we must pay the price for their high accuracy. In the future, we will embed these formulas into software or hardware of medical ultrasound machine so that users are easy to read estimated values resulted from machine.


We express our deep gratitude to the author Michael Thomas Flanagan – University College London and the author Jos de Jong for giving us helpful software packages that help us to implement the framework.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Loc Nguyen, Truong-Duyet Phan and Thu-Hang T. Ho (August 1st 2018). Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight, eHealth - Making Health Care Smarter, Thomas F. Heston, IntechOpen, DOI: 10.5772/intechopen.74883. Available from:

chapter statistics

582total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Using Patient Registries to Identify Triggers of Rare Diseases

By Feras M. Ghazawi, Steven J. Glassman, Denis Sasseville and Ivan V. Litvinov

Related Book

First chapter

Introductory Chapter: Blockchain Technology and Smart Healthcare

By Thomas F. Heston

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us