A Distributed Benchmarking Framework for Actual ET Models

contains 23 related to the modeling and simulation of evapotranspiration (ET) and remote sensing-based energy balance determination of ET. These areas are at the forefront of technologies that quantify the highly spatial ET from the Earth's surface. The topics describe mechanics of ET simulation from partially vegetated surfaces and stomatal conductance behavior of natural and agricultural ecosystems. Estimation methods that use weather based methods, soil water balance, the Complementary Relationship, the Hargreaves and other temperature-radiation based methods, and Fuzzy-Probabilistic calculations are described. A critical review describes methods used in hydrological models. Applications describe ET patterns in alpine catchments, under water shortage, for irrigated systems, under climate change, and for grasslands and pastures. Remote sensing based approaches include Landsat and MODIS satellite-based energy balance, and the common process models SEBAL, METRIC and S-SEBS. Recommended guidelines for applying operational satellite-based energy balance models and for overcoming common challenges are made. following:


Introduction
With the various types of actual ET models being developed in the last 20 years, it becomes necessary to inter-compare methods. Most of already published ETa models comparisons address few number of models, and small to medium areas Gao & Long, 2008;García et al., 2007;Suleiman et al., 2008;Timmermans et al., 2007). With the large amount of remote sensing data covering the Earth, and the daily information available for the past ten years (i.e. Aqua/Terra-MODIS) for each pixel location, it becomes paramount to have a more complete comparison, in space and time. To address this new experimental requirement, a distributed computing framework was designed, and created. The design architecture was built from original satellite datasets to various levels of processing until reaching the requirement of various ETa models input dataset. Each input product is computed once and reused in all ETa models requiring such input. This permits standardization of inputs as much as possible to zero-in variations of models to the models internals/specificities.

Theoretical points of observation 2.1 Net radiation and soil heat flux
In the two-source energy balance approach, like TSEB and SEBS differ from the single-source concept of SEBAL and METRIC in the sense that the radiation and energy balances have separate formulations for either bare soil or canopy. The energy balance at any instantaneous moment is expressed by equation Eq. 1: Where Rn is Net Radiation, G is soil heat flux, H is sensible heat flux and LE is latent heat of vaporization. This is what is appearing in single-source models like SEBAL and METRIC. Single source models concentrate on identifying Rn and G from astronomical and semi-empirical equations respectively, while H is being iteratively solved based on thermodynamically exceptional geographical locations, often referred in literature (Bastiaanssen, 1995) as wet and dry pixels, also the technique to identify them is referred in more recent literature as end-members selection/identification (Timmermans et al., 2007).
In two-source models, it is separated into bare soil and canopy energy balances as in Eq. 2 and 3, respectively: Rns = G + Hs + LEs Where Rns is the net radiation of bare soil surface, Hs is the sensible heat flux from bare soil, LEs is the latent heat of vaporization from soil surface.
Where Rnc is the net radiation from canopy of crop, Hc is the sensible heat flux from canopy, LEc is the latent heat of vaporization of crop. Once the elements of those two equations are found, the fraction of vegetation cover (fc) is used to combine them into the area of a satellite remote sensing pixel, which is inherently a mixel of bare soil and canopy. The Net Radiation is partitioned according to the formulation commonly used in two-sources model (Eq. 4 and 5), where the soil partition of Rn is an LAI-based extinction coefficient (Choudhury, 1989) with a coefficient C ranging from 0.3 to 0.7 (Friedl, 2002), depending on the arrangement of the canopy elements. Friedl (2002) mentions that a canopy with spherical (random) leaf angle distribution would lead to a C value of 0.5.
Where LAI is the leaf area index, sunza is the sun zenith angle. Friedl (2002) mentions that he derived his soil heat flux formulation from his previous work (Friedl, 1996). It takes the already available soil fraction of net radiation and the cosine of the sun zenith angle (Eq. 6). A coefficient is then multiplied to those whereby soil type and moisture conditions are taken into consideration after (Choudhury et al., 1987).
Where Kg is the soil type and moisture condition coefficient in the soil heat flux. The Fraction of Vegetation cover is necessary to split the two-sources of heat transfer studied in such models. They are the soil surface (bare soil) and the vegetation canopy surface. The fraction of vegetation cover from Jia et al. (2003) quoting Baret et al. (1995) is developed as in Eq. 7: with K being taken as 0.4631 in Jia et al. (2003) and NDVImin at LAI=0 and NDVImax at LAI = +INF. As can be seen, a very large weight of potential deviation from the expected result is resting in the proper assessment of fc (Eq. 7). There are also uncertainties in the LAI raster input (Yang, Huang, Tan, Stroeve, Shabanov, Knyazikhin, Nemani & Myneni, 2006;Yang, Tan, Huang, Rautiainen, Shabanov, Wang, Privette, Huemmrich, Fensholt, Sandholt, Weiss, Ahl, Gower, Nemani, Knyazikhin & Myneni, 2006). The soil heat flux computed for Bastiaanssen (1995), is what could be called a partial contribution of soil heat flux to the energy balance of the pixel, as the semi-empirical relationship is proportional to various elements of thermodynamic forcing within each pixel (Eq. 8).
with T c the temperature in Celsius and r 0 the Albedo to apparent Albedo correction ranging 0.9 to 1.1 depending on the time of the day. SEBS uses a two-source Albedo anchors stretching equation multiplied by the soil fraction of the pixel to extract a percentage of the net radiation as soil heat flux (Eq. 9).
Generic values are Albedo dark = 0.05 and Albedo bright = 0.35, while adjustements are made when concentrating on a specific land use, eventually.

Monin-Obukhov Similarity Theory
The Monin-Obukhov Similarity Theory (Monin & Obukhov, 1954) is being used in single source and two-source energy balance models. It is interesting to note that Monin & Obukhov (1954), in the development of their Monin-Obukhov Similarity Theory (MOST) considered the friction velocity to be about 5% of the geostrophic wind velocity having an average speed of 10m/s results in the friction velocity being around 0.5 m/s, and with the Coriolis parameter l = 10 −4 s −1 and a tolerance of 20%, an estimate of the height of the surface layer is found at h=50m, that is also the DisALEXI blending height for air temperature (Norman et al., 2003). The dynamic velocity within this layer can be considered near to constant and the effect of Coriolis Force neglected (Monin & Obukhov, 1954). Under those conditions of neutral stratification the processes of turbulent mixing in the surface layer can be described by the logarithmic model of the boundary layer (Eq. 10).
with ψ m , ψ h the diabatic correction of momentum and heat through their changes of states, most x a MOST internal parameter, L the Monin-Obukhov Length (MOL), k is the von Karman constant, g the gravity acceleration, u is the wind speed, ρ is the air density, T is the temperature and h is the height of interest (measurement height of the wind speed, roughness length, etc.). Constraints to MOST as found in Bastiaanssen (1995) are of two types, first avoiding the latent heat flux input to be nil as its input location is in the denominator of the MOL equation (Equation Eq. 11), the second constraint is when the MOL is becoming positive, to force ψ m and ψ h to a ranged negative value (Bastiaanssen, 1995).
It turns out that Su (2002), extending the reach of his SEBS model to the GCM community has included a dual model for the convective processes within the Atmospheric Boundary Layer (ABL). Su (2002) followed the observations of Brutsaert (1999) that the ABL lower layer is either stable, either unstable and that the thickness of this lower layer is α = 10-15% of the ABL height, which is about β = 100-150 times the surface roughness. SEBS takes the highest from both as its estimation of h st , the height of ABL sublayer separation. If the reference height is lower than that, then the lower sublayer model is run, otherwise the upper sublayer models is used. The cutline between the two sublayers of the ABL permits SEBS to process the lower layer (Atmospheric Surface Layer, ASL) under the MOST paradigm (Eq. 12), whether it proves unstable (generally in the day) or stable (generally in the night). The momentum and heat functions for the upper sublayer of the ABL where flow is laminar (free convection) can then be merged with the ASL by what Su (2002) calls a Bulk ABL Similarity (BAS) stability correction set of functions called here ξ m and ξ h for momentum and heat respectively (Eq. 13).
with z 0 the surface roughness for momentum, h i the height of ABL or Planetary Boundary Layer (PBL). The formulation of ψ m and ψ h functions are inherited from Beljaars & Holtslag (1991) and include either correction weights inside the standard equations in some cases (unstable conditions of ψ m and ψ h ), either a polynomial with exponential in other cases (stable conditions of ψ m and ψ h ). However, Beljaars & Holtslag (1991) stated categorically that the data described are characteristic for grassland and agricultural land with sufficient water supply. Allen et al. (2005) mentions that METRIC and SEBAL do not require knowledge of crop type (no satellite based crop classification is needed). SEBAL relies on a type of semi-empirical equation relating NDVI to the roughness length (also called roughness height) for momentum and heat (Eq. 14).

Roughness height
Among many others, Chandrapala & Wimalasuriya (2003) proposed a = −5.5 and b = 5.8 for Sri Lanka using AVHRR NDVI images (sensor response curves, atmospheric correction and pixel size are all influencing NDVI response). Bastiaanssen (1995) preferred using extrema conditions to define a and b. SEBS takes a ground truth added to mapping point of view and uses a look up table to translate land use raster maps into roughness length.

Land Cover NDVI
Desert 0.02 0.002 Table 1. Boundary conditions on z 0m from NDVI satellite data after Bastiaanssen (1995) While SEBAL and METRIC use a fixed conversion rate between roughness length for momentum and roughness length for heat, Su (2002) is introducing in SEBS the use of the exponential of kB −1 (Eq. 15).
with h vegetation the height of the vegetation relating to the roughness length for momentum z 0m from Brutsaert (1982), z 0h the roughness length for heat, Su (2002) refers to the work of Massman (1999) on the combined von Karman constant -sublayer Stanton number (kB −1 ), where B −1 is defined by Gieske (2007) as in Eq. 16: is the roughness Stanton number, u * is the friction velocity, ρ the air density, C p the specific heat and ΔT the temperature difference.

Aerodynamic roughness for heat
The aerodynamic resistance (roughness) for heat r ah is an input to the sensible heat flux and is often a source of concentration in ET models based on energy balance. For logical reasons, as the parameterization of r ah needs prior knowledge of the state of the sensible heat flux to enable knowledge of the MOL to parameterize ψ m and ψ h the diabatic correction of momentum and heat through their changes of states. In turn, ψ m and ψ h , offset the logarithmic relation of the observation height to the respective roughness lengths (z 0m and z 0h ) being the driving force to curve the relation to the wind shear profile. SEBS resistance at the wet limiting case re wet (Eq. 17) is using the MOL as L wet configured to use all the energy available for evaporation, which in turn is used in conjunction with the reference height (z re f ) either in the computation of the ψ h or χ h whether ASL or BAS models are at work.
2.5 Ground to air temperature difference Allen et al. (2005) mentions that METRIC is a variant of the important model SEBAL and that it has been extended to provide tighter integration with ground-based reference ET. SEBAL formulation for the ground to air temperature difference (dT) is estimated (Eq. 18) as an affine function with two extreme conditions found in the satellite image processed (Bastiaanssen, 1995).
with r ah the aerodynamic resistance for heat, pixel cold and pixel hot the end-members defined in Bastiaanssen (1995); Bastiaanssen et al. (1998);Timmermans et al. (2007) and that are the signet ring of the model. METRIC formulation (Eq. 19) includes the reference ET paradigm found in Allen et al. (1998), the Kc × ETo crop ET, also called ETc, being translated into METRIC as k × ET r (Allen et al., 2005), practically using Alfalfa at full growth as anchor point in their Idaho study. Some extra metorological data being available when using METRIC, permits a daily surface soil water balance to be run to enforce conditions on the dry/hot pixel energy balance and effective dT. Selection of the extreme pixels is focused on cropped area as much as possible.
Where T s is the soil temperature, T v is the (virtual) vegetation canopy temperature, f c is the fraction of vegetation cover, DEM is the elevation and T c is the satellite sensed temperature in Celsius, h u is the wind speed measurement height. h disp is the displacement height being 0.65 of the canopy height in SEBS, Monin & Obukhov (1954) mention that for observations made at height superior to 1 meter, the displacement height can be nullified. The blending height (h pbl ) is given by an external mean or if no data is available, default value of 200m is used (same as in SEBAL) or 1000m.

Actual ET
Single sources models (METRIC and SEBAL) have promoted a particular way of closing the energy-balance (Eq. 21), using a ratio of fluxes called the evaporative fraction (Λ; Eq. 21).
This instantaneous ratio (could be called an efficiency, since it is unitless) is multiplied with a potential value of ET in order to provide with an Actual ET (equation Eq. 22).
METRIC (Allen et al., 2007) evaporative fraction is used with the potential ET based on Penman-Monteith method as published in Allen et al. (1998) in order to produce an actual ET estimation. In contrast SEBAL (Bastiaanssen, 1995;Bastiaanssen et al., 1998) evaporative fraction is used with the potential ET computed from exo-atmospheric solar radiation, a single-way atmospheric transmissivity, and the reflected proportion from Albedo. SEBS (Su, 2002) is computing an instantaneous latent heat flux (LE) from the evaporative fraction and the subtraction of net radiation and soil heat flux (Eq. 23). However, using what is called in SEBS the relative evaporation, the pixel value of H the sensible heat flux is allowed to vary only between H wet and H dry .
H wet is derived from the Priestley-Taylor equation (Priestley & Taylor, 1972), from which γ and slope come from, r e_wet is the resistance at the wet limiting case. E relative is the relative evaporation. SSEB (Senay et al., 2007) is evaluating a regional approximation of the evaporative fraction Λ as a ranging of satellite-based temperature products (Eq. 24) and uses the reference ET found in Allen et al. (1998) as a mean to compute the actual ET.

Conceptual design and processing flow
The distributed framework is a Linux system based on GDAL library (GDAL, 2011) and C programming, enhanced with a distributed language called OpenMP (OpenMP, 2011), used essentially for data distribution as seen in (Chemin, 2010). As the conceptual architecture of the framework (Fig. 1) signifies, there are several layers of processing involved. Initially, downloaded satellite imagery is located in a single directory (referred as RS data in Fig. 1).
During the pre-processing phase, a parsing agent will select images of the same kind and same date and stitch them together, then reproject them to the projection system selected. Upon completion, a second agent will perform a relatively conservative quality check and assign null value to failed pixels according to the satellite imagery information available.

Fig. 1. Architecture Concept of the Framework
Once these two steps are performed, the raster maps are tagged as products. Those products will be shared in between any of the ET models that require such type of input. Some ET models require some higher-level input raster maps, by this, we define higher-level product as a raster that requires at least one product as defined above as precursor to its creation.

Meteorological data
Meteorological data (referred as Point data in Fig. 1) is encoded with Fourier Transforms (FT) in function of cumulative day of year from the beginning of the satellite imagery data set. This insures both faithfulness of the data and high-portability as well as an elegant way to summarize a complex and variable non-spatial dataset. Actual state of the research in meteorological data time-series encoding for this framework is to develop an array of geo-tagged FT. Also under consideration are Wavelet Transforms (WT), for reasons of time tagging. This array will be used to interpolate on-the-fly from an appropriate number of neighbours and with an interpolation algorithm suiting best the operator requirement. The reason for this, is that it transfers the load from storage requirements (which can be heavy for daily meteorological raster datasets) to thread computing that is benefiting from distributed speed-up as only a marginal number of equations will be added by such process compared to the ET models processing load.

Implementation
Practically (Fig. 2), MODIS datasets are grouped by products and by day and batch processed each in one core of the computer in parallel. This involves format changing, merging tiles, reprojecting, renaming outputs according to the nomenclature of the processing system. The tools involved in that step are either standard Linux Shell tools, either part of GDAL (2011) standard tools (i.e. gdalwarp and gdal_translate). Both of these tools are still essentially sequential programs at this time, thus, they are being sent to each core in a distributed manner through the Shell with a check loop to ensure that there is at all time the same number of programs running as there are cores/threads available in the CPU architecture. It becomes clear that for each new leap in number of cores in future commercial offerings, the framework will automatically increase its processing capacity to the new enlarged number of cores/threads available, thus also reducing by the same factor the time needed to process a given number of satellite images.  Su (2002) and TSEB from both Kustas & Norman (1999) and Norman et al. (1995). One of the many candidate for inclusion was the Two-Source Algorithm (TSA) from Yunhao et al. (2005). After extensive calculus and referring to the mathematic academia, it became clear that both temperature equations below when combined to extract T vegetation and T soil (Eq. 25) have a large amount of solutions even within the constraining dimension of surface skin temperature range. The available information to possibly close the equations unknown parameters are relating to satellite T after Price (1984) and the satellite-based emissivity (ǫ), partitioned into ǫ vegetation = 0.93 & ǫ soil = 0.97 quoting Sui et al. (1997). Unless missing information or other sources of solution dimension constraints are published, the proposition of the model is so far considered impossible and not included in the framework until such condition is being met through publication.
With ǫ the satellite-based emissivity, ǫ vegetation & ǫ soil assumed fixed emissivity values for vegetation and bare soil (satellite response dependent), σ the Stephan-Boltzmann constant, T the satellite-based land surface temperature, and T vegetation & T soil the pixel vegetation and soil fractions temperatures. The first equation answers to the geographical two-source proportion within the pixel, while the second equation answers to the two-source flux merging according to Yunhao et al. (2005). Reference ET models included are Allen et al. (1998) from Cannata (2006), Priestley and Taylor (Priestley & Taylor, 1972) and Hargreaves , Modified Hargreaves (Droogers & Allen, 2002), Hargreaves-Samani (Hargreaves & Samani, 1985). Only the reference ET from Allen et al. (1998) is being used as a precursor of SSEB (Senay et al., 2007) and METRIC (Allen et al., 2007) actual ET. It was found preponderant to have a minimum group of reference ET models available as baseline for all the work, especially when looking into geographical areas where meteorological data has always been dominant in agricultural literature. Some models requiring operator intervention (SEBAL, METRIC) have add there internals modified with specially designed heuristics acting as operators. Initial developments were not looking into heuristics but stochastic algorithms. Some efforts using a genetic algorithm were eventually too expensive in processing time, while at the same time end-member selection information were becoming more common (Chandrapala & Wimalasuriya, 2003;Timmermans et al., 2007). Thus heuristics were designed and implemented on a regional basis, initially studied under the Greek conditions for the purpose of Alexandridis et al. (2009) andChemin et al. (2010). Eventually, the heuristics are extended to fit data sources, continent/climate combinations and model types on an adhoc basis as new regions are included into the geographical scope of research.

Initial results
In the case of SEBAL heuristic, the convergence reached 82% of the images processed for the Australian Murray-Darling Basin (1 Million Km 2 ), enabling the automatic processing of 3635 MODIS multi-tiles images within a single day of computing. Fig. 3 is the output from SEBAL with such heuristic for some irrigated areas in Australia, the total area being processed amounts to more than 5 Billions pixels of ETa values, being multiplied by as many temporary rasters and original data as required for each of the ET models. The Australian irrigation system (less than 100,000 ha) has a sharp, contrasted and well-defined pattern of water depletion, characteristic of continental dry climate with high water supply control for defined periods of the year where crops are in the field.  (2003)(2004) Looking into the matter of comparing ETa results from different ETa models, Fig. 4 is the averaged ETa output from two models (SEBAL and SSEB) over the tropical island of Sri Lanka in 2003 and 2004. It turns out that the relatively small island of Sri Lanka has an average ETa that is changing much more on a day to day basis than our previous example in Australia. Scale, climate, topography yield exposure to ocean events frequently, having drastic impact on thermodynamics of the island surface as the Fig. 5 also confirms. Changes between models of actual ET from SEBAL and SSEB are relatively constant throughout the RS modeling period. Actual ET from SSEB is in the upper range of SEBAL's one. The work of de Silva (1999) in the dry zone of Sri Lanka and the work of Hemakumara et al. (2003) in the wet zone of Sri Lanka are falling within the expected results found here. Likewise the average evaporative fractions found for Sri Lanka in Fig. 5 are especially leveraging the larger dry zone area of the island with value in the range of 0.3 to 0.5.

Conclusion
Challenges to experimentally compare ET models are immense, the theoretical points of comparison are sometimes clear, sometimes rather difficult to pinpoint. To try and address this situation, a framework for benchmarking ET actual models has been designed. Its implementation has embedded parallel data distribution at the base of each parts of the framework to remove the resistance of the data size to process large areas, high frequency and large time period with commonly available computers. Future work includes the finalization of SEBS (Su, 2002) and TSEB (Kustas & Norman, 1999) integration in the framework, looking for other ETa model candidates to add to existing ones. Also there is a need for designing and creating statistical tools to cross-compare several depths and layers of ETa models processing datasets. Finally, the use of OpenMPI (OpenMPI, 2011) is envisaged for concurrently running several ET models diagnostics in different multi-core machines or OpenCL (Khronos.org, 2011) kernel-based data distributed language to process all analysis as one large computation on a Graphical Processing Unit (GPU).

References
Alexandridis, T. K., Cherif, I., Chemin This edition of Evapotranspiration -Remote Sensing and Modeling contains 23 chapters related to the modeling and simulation of evapotranspiration (ET) and remote sensing-based energy balance determination of ET. These areas are at the forefront of technologies that quantify the highly spatial ET from the Earth's surface. The topics describe mechanics of ET simulation from partially vegetated surfaces and stomatal conductance behavior of natural and agricultural ecosystems. Estimation methods that use weather based methods, soil water balance, the Complementary Relationship, the Hargreaves and other temperatureradiation based methods, and Fuzzy-Probabilistic calculations are described. A critical review describes methods used in hydrological models. Applications describe ET patterns in alpine catchments, under water shortage, for irrigated systems, under climate change, and for grasslands and pastures. Remote sensing based approaches include Landsat and MODIS satellite-based energy balance, and the common process models SEBAL, METRIC and S-SEBS. Recommended guidelines for applying operational satellite-based energy balance models and for overcoming common challenges are made.