Generating Artificial Weather Data Sequences for Solar Distillation Numerical Simulations

According to the natural geographical distribution, developing countries are concentrated in tropical climates, where radiation is abundant. So the use of solar energy is a sustainable solution for developing countries. However, daily or hourly measured solar irradiance data for designing or running simulations for solar systems in these countries is not always available. Therefore, this chapter presents a model to calculate the daily and hourly radiation data from the monthly average daily radiation. First, the chapter describes the application of Aguiar ’ s model to the calculation of daily radiation from average daily radiation data. Next, the chapter presents an improved Graham model to generate hourly radiation data series from monthly radiation. The above two models were used to generate daily and hourly radiation data series for Ho Chi Minh City and Da Nang, two cities representing two different tropical climates. The generated data series are tested by comparing the statistical parameters with the measured data series. Statistical comparison results show that the generated data series have acceptable statistical accuracy. After that, the generated radiation data series continue to be used to run the simulation program to calculate the solar water distillation system and compare the simulation results with the radiation data. Measuring radiation. The comparison results once again confirm the accuracy of the solar irradiance data generation model in this study. Especially, the model to generate the sequences of hourly solar radiation values proposed in this study is much simpler in comparison to the original model of Graham. In addition, a model to generate hourly ambient tempearure date from monthly average daily ambient temperature is also presented and tested. Then, both generated hourly solar radiation and ambient temperature sequences are used to run a solar dsitillation simulation program to give the outputs as monthly average daily distillate productivities. Finally, the outputs of the simulation program running with the generated solar radiation and ambient temperature data are compared with those running with measured data. The errors of predicted monthly average daily distillate productivities between measured and generated weather data for all cases are acceptably low. Therefore, it can be concluded that the model to generate artificial weather data sequences in this study can be used to run any solar distillation simulation programs with acceptable accuracy.


Introduction
With the development of computers, simulation programs are increasingly developed and become useful and indispensable tools for researchers and designers. It helps users to optimize design and system parameters without having to spend money to build experimental models and waste time to conduct experiments. Prophylactic programs in the field of solar water distillation are no exception. To run solar water distillation simulations, users need to provide weather data such as solar irradiance and ambient temperature measured in days, hours or smaller time periods. According to the requirements of the software. However, hourly weather data is not always available, especially in developing countries because measuring hourly weather data requires equipment, time and money. According to Duffie and Beckman [1], these weather data must be collected for at least 8 years to get the average value to remove the anomalies of the weather such as El Nino, La Nina phenomena, etc.
Another solution is to use typical meteorological year (TMY) data. In fact, the concept of TMY is derived from long-term weather data, which is determined in the correlation and statistical distribution to determine the characteristic indexes to produce the average value [2]. These data are then extracted from the selection criteria to produce month-by-month data from 23 years' data. TMY data were established for 26 Canadian sites, and were applied to the concept of a similar test reference year (TRY) for Europe [2]. While this approach reduces computational effort and the data base required to run simulations, the metric is also based on long-term data, something not available in most places in the world, especially in developing countries.
As pointed out by Nguyen and Hoang [3], the shortage of weather data, especially solar radiation data is very serious in developing countries. For example, in Vietnam, out of a total of 171 hydro-meteorological stations, only 12 have total solar radiation data, of which only 9 have continuous measurements. The remaining meteorological stations only record the number of hours of sunshine. Furthermore, radiation metrics are manually measured by humans every 3 hours instead of hourly. Therefore, hourly radiation data at hydro-meteorological stations in this country are not reliable enough to be used for simulation programs using solar energy systems.
There are two ways to solve the problem of lack of measurement data at the survey site: (i) using extrapolation to process data from hydrometeorological sites adjacent to similar climate features, and (ii) use aggregate generation to generate a series of weather data from the data requiring at least monthly averages. However, the first method can lead to large errors, moreover, very few developing countries have such data available [2]. Therefore, the following method has been studied and developed by many researchers.
Many researchers have proposed mathematical models to calculate the complete series of weather data. Fernandez-Peruchena et al. [4] and Boland [2] used numerical methods to probabilistically simulate daily and hourly solar irradiance data series. Brecl and Topic [5] used a similar approach to generate daily and hourly solar irradiance data from average daily irradiance values. Bright et al. [6] and Hofmann et al. [7] also apply statistical probabilistic techniques to generate a series of solar irradiance values per minute or every 5 minutes from hourly solar irradiance data. Soubdhan and Emilion [8] even used a random method to generate a sequence of solar radiation in seconds. Magnano et al. [9] applied the same technique to generate a synthetic sequence of half an hour's temperature. A common feature of the aforementioned studies is the use of a probability distribution function (PDF) of the data to normalize random variables to bring them to a Gaussian distribution [10].
Gafurov et al. [11] another approach was to incorporate the spatial correlation of solar irradiance (SCSR) into random solar irradiance data generation models to generate monthly solar irradiance time series and daily.
Recently, several researchers have used different types of artificial neural networks (ANNs) to model the values of total solar irradiance on horizontal surfaces, such as [12][13][14][15][16]. Wu and Chan [17] used a new combined model of ARMA (Automatic Recovery and Moving Average) and TDNN (Time Delayed Neural Network) to predict hourly solar irradiance in Singapore. However, Mora-Lopez [10] pointed out that the limitation of these methods is that they are "black boxes" for outputting and analyzing averages of daily global irradiance, resulting in no important information can be obtained from these methods. Mora-Lopez et al. [18] proposed to use machine learning theory with a combination of probability finite automata (PFA) to calculate the values of total daily solar irradiance. The limitation of this method is that the use of PFA is complex and the method has not been shown to be universally applicable.
The results of the above review and analysis show that the stochastic methods are still globally applicable, simple and require minimal input data. Therefore, in this study, randomization technique was chosen to generate series of weather data, including solar irradiance and daily and hourly ambient temperature from monthly averages. These are important weather metrics for running the numerical simulations of solar distillation systems. First, a stochastic model is used to generate a composite of daily irradiance from monthly average daily solar irradiance values. The generated daily radiation sequences are then used to generate the hourly solar radiation sequences. Similarly, a model for generating hourly temperature series from monthly mean temperature values is also presented.

Model of generating daily solar radiation sequence
When analyzing data of 300 months of solar radiation taken from 9 hydrometeorological stations with different weather characteristics, Aguiar et al. [19] discovered the analyzed solar irradiance values in For any time period, there is a probability distribution function that seems to be related to the monthly mean clearness index, K T , for that time period. Furthermore, they also found that the daily radiation value of any given day is statistically related to the value of the previous day. Based on this finding, Aguiar and colleagues built up 10 Markov matrices (called MTM library) from the data analysis of 300 months of solar radiation mentioned above. 10 subdivided matrices include: 1 matrix with K T ≤ 0.3 typical for months with very low direct radiation components; The next 8 matrices are for the months where K T varies from 0.3 to 0.7 with the increment of K T increasing by 0.05 for the next matrix; the final matrix with K T > 0.7 is for months with a high direct radiation component. The MTM library was then used to generate the daily radiation series from the average daily irradiance values for the locations in the United States for which the irradiance data were not used to generate the aforementioned MTM library. This simulation result was compared with the measured radiation results and compared with the simulation results from Graham's model [20]. When comparing statistical parameters such as mean, variance, and probability density functions as well as statistic characteristics (e.g. autocorrelation functions), Aguiar's model produced more accurate data series than with the Graham model. Furthermore, Aguiar's model is computationally simpler than Graham's model [21].
To calculate the K T values from the monthly mean K T , in this study, the Aguiar method was chosen for the following reasons: • The calculation expressions given by Graham are based on results that are built from the data of the United States, so certainly not suitable for tropical climates [22]. This was confirmed by the Nguyen and Pryor [21] in their study, and even in high K T regions, the above expressions do not fit the curve due to Liu and Jordan [21].
• The above disadvantage of Graham's method can be overcome by adding the expressions for tropical climates developed by Aguiar group [19], but the use of this expression has not been fully verified by scientists and will create a complex, climate-dependent computational model.
• The locations where the Aguiar group used the measured data to build the MTM matrices include many characteristic climate zones, in these locations there is one location with a central tropical rainforest (C aw ) climate, which is Macao. and one location with a rainforest climate (A w ) is Polana, Mozambique. The use of this method would therefore be suitable for the study's objective of tropical climates, where most developing countries are often located.

The data are used to evaluate the accuracy of the model
Solar irradiance data were measured in 2 cities representing 2 tropical climates to evaluate the accuracy of daily solar radiation data generated from the calculated model selected in this study. They are Ho Chi Minh City representing the tropical forest climate (A w ) and Da Nang representing the tropical monsoon climate (A m ). Pyranometers were used to measure total irradiance on a horizontal plane in the two cities every 5 minutes, measuring continuously from 5:30 a.m. to 6:30 p.m. Since these two cities have low latitudes (10 0 N and 16 0 N respectively), the day length does not change much during the year, so there is no need to extend the seasonal solar irradiance measurement period of the year [3]. Then, solar irradiance by hour, by day and by day of month average is calculated from the measured data. Table 1 presents the average daily solar irradiance of the two cities mentioned above, used to run the program to generate date and time irradiance data in this study.
From the series of daily radiation values of 365 days of the year, the series of 365 values of the daily clearness index is calculated according to the following Eq. (1): where H is the total daily solar radiation measured on a horizontal plane and H 0 is the daily radiation outside the atmosphere, calculated by the equation: with G sc , n, ϕ, δ and ω s respectively are solar constance, day of the year, latitude of the investigated location, declination angle and sunset hour angle, defined in [1].
From the values of daily clearness index K T , the monthly average daily values of clearness index K T for 12 months of the year are calculated: where the monthly average daily irradiance values H are taken in Table 1 and monthly average daily irradiance values outside the atmosphere H 0 are calculated by Eq. (2) with day n being the average day of the month, given in [1]. Table 2 presents the K T values of the 2 investigated cities. Figure 1 presents the procedure for calculating the series of daily clearness index from the monthly average daily values.

Applying Aguiar's model
After 365 values of the daily photometric index are calculated for each location, these value series are compared with the measurement series through statistical functions such as cumulative distribution function (CDF), density function, etc. probability (PDF). Figures 2 and 3 present the cumulative distribution function of CDF of the calculated and measured K T in Ho Chi Minh City and Da Nang while Figures 4 and 5 represent the probability density function PDF for these two cities. Statistical parameters including mean, median, minimum, maximum, standard deviation, mean absolute error (MAE) and mean square error (RMSE) were also compared between the K T series. Calculated and measured, as shown in Tables 3 and 4.
The results shown in Figures 2-5 show that the Aguiar model produced a K T daily value series with an acceptable level of accuracy compared with the measured series. Similarly, the statistical parameters in Tables 3 and 4 also show that the statistical error between the calculated and measured series is relatively small. Specifically, the mean and median error percentages of generated chains are 1% and À 4% for Ho Chi Minh City and 6% and 14% for Da Nang, respectively. Therefore, this model is expected to be able to be used to generate a series of daily cloud optical coefficients for any location because the Aguiar model has been proven to be universally applicable in the world [2,4,5,19,22]. As shown above, this model only requires input of 12 average daily solar irradiance values at the location to be calculated.  There are many studies in the world on establishing mathematical models to generate daily and hourly radiation data series [2,4,5]. Basically, these studies are based on the approach of Aguiar [19] and Graham [20].
With the assumption that the sky clarity index k t depends only on the cloudiness coefficient of the K T day, Graham et al. [20] analyzed k t into two components: an average (or trend) component and a random component.
The formula for calculating the trend component or the regular component:   where m is the air mass, the value calculated at the time of the middle of the hour. The parameters λ, ρ and κ are the identity function of K T : The standard deviation σ α of the variable α is expressed as: Then use the Gaussian normalization technique to transform this variable α into a Gaussian variable β with the relation between α and β as follows: Then apply ARMA models to the data series β and determine that β follows the model as AR (1): where: β t-1 is the value of the variable at t-1. Φ is the automatic regression coefficient. ϑ t is a random number from a normal distribution with zero mean and a standard deviation ffiffiffiffiffiffiffiffiffiffiffi 1 À ϕ p As in the case of using date values, the coefficient Φ varies slightly by locality but the value 0.54 can be chosen as the value to use in the model for all localities.
Aguiar's method [23] to generate k T is similar to Graham's method but has some differences as follows: • First, the standard deviation σα depends not only on K T but also on the altitude angle of the sun h s : with: • Second, the coefficient Φ depends on K T according to the following expression: Φ ¼ 0:38 þ 0:06 cos 7:4K T -2:5 ð Þ (15) • Third, the calculated k t values are limited by the clear sky clearness index k cs . k cs t ð Þ ¼ 0:88 Â cos π t À 12: With t is the hour considered. Aguiar's model has been successfully applied to generate hourly solar irradiance in Spain and Slovenia [4,5], while Graham's approach to generate hourly solar irradiance sequences at different locations has been successfully applied for locations in the United States in particular and many parts of the world in general [24]. On the other hand, Aguiar and Graham's models were used in generating and comparing hourly solar irradiance for 6 locations in Australia and the results showed that Graham's model produces a global solar irradiance sequence which better fit for all six sites [21]. Therefore, Graham's model was chosen to generate hourly solar radiation in this study.
However, Graham's model also has some disadvantages as follows: • The use of Gaussian mapping technique to process the random component of kt values is very complicated and time consuming.
• The error of the hourly series of transparency index generated from Graham's model in comparision to the measured series is larger than the error from the model sugeeted in this study.
To solve the complicated and time consuming problem of the Graham model, the β values given in Eq. (10) are transformed to a non-standard distribution h by using Norminv function in MATLAB. The random components of the k t value are then calculated by: with σ α (K T ) is the standard deviation computed by using Eq. (9). This suggested model are not only much simpler than Graham's method, but also produces more accurate results. Table 5 demonstrates the hourly solar irradiance sequence error generated by Graham's model and this study's modified model compared with the measured solar irradiance sequence for Ho Chi Minh City over 20 times run a program written in MATLAB. Figure 6 shows the process of creating a series of hourly clearness index series. This procedure is modified from the Graham model, as analyzed above. In this Figure: ϕ is the investigated location's latitude L st and L loc are respectively the longitude of the standard meridians and the considered location.

Modified Graham model to create a series of hourly clearness indices
j is the month of the year. i is the day of a month ω s is the angle of sunset for the calculated day K T [i][j] is the daily clearness index of the i th day in the j th month ω is the calculated hour angle. k tm is the "long-term" average value of k t σ kt is the standard deviation of k t toward the values of the "long-term" average value ε t is a Gaussian distribution' s random number hr. is the investigated hour. χ is a Gaussian distribution' s random variable with "0″ mean and "1″ variance. θ 1 is the parameter of the AR1 model. F normal is a function to convert a Gaussian variable into a non-normally distributed variable MATLAB is used to write generation program for hourly k t series.

Validate generated hourly clearness index strings
The daily generated transparency index values were used as input to the hourly k t series generation program. The calculated k t values are then compared with k tmea values, where k tmea is the measured hourly clearness index values given by:  where I is the horizontal total solar irradiance measured from ω 1 to ω 2 in Ho Chi Minh City and Da Nang; I 0 is the solar radiation outside the atmosphere from hour angle ω 1 to ω 2 , given by [1]: The cumulative distribution function (CDF) graphs of k t over the hours for Ho Chi Minh City and Da Nang are shown in Figures 7 and 8 respectively while the probability density function (PDF) of k t over the hour for these cities are shown in Figures 9 and 10. Additionally, several statistical parameters, including mean, median, minimum, maximum, standard deviation, mean absolute error (MAE) and mean square error (RMSE) of the measured and generated k t series in Ho Chi Minh City and Da Nang are also shown in Tables 6 and 7, respectively.  As presented in Figures 7-10 and Tables 6 and 7, the hourly k t series for the two investigated cities have been successfully generated by the suggested model with very high accuracy. The mean and median error percentages of generated sequences were À 1.5% and À 2.4% for Ho Chi Minh City and À 1.3% and 0.3% for Da Nang,  Table 6.
Statistical parameters of the k t series of Ho Chi Minh City. respectively. Since stochastic models have been approved to have universal characteristics, as mentioned above, the model in this study is expected to be applicable to any location in the world.

Model to generate hourly ambient temperature sequences
The procedure to generate hourly ambient temperature sequences from monthly mean ambient temperature, T a , and monthly mean daily clearness index, K t , was described by Knight et al. [25]. The model to generate artificial hourly ambient temperature sequences for Australia was developed by Nguyen and Pryor [21].
First, to generate the deterministic component of the hourly ambient temperatures series, the concept of the average normalized diurnal temperature variation, developed by Erbs et al. [26], is applied. Hourly measured ambient temperature data from two locations (Ho Chi Minh city and Da Nang) are used to calculate the hourly monthly-average ambient temperature, T a,h , at each hour of the day for each month. These curves are standardized by subtracting the monthly-average daily temperature, T a , from each of the hourly values and then dividing by the amplitude of the curve (defined as the difference between the maximum and minimum hourly average temperatures over the day), A. Subsequently twelve cosine curves are derived for each location.
The average of these 24 (ie., 12 curves * 2 locations) standardized curves are calculated. Interestingly, the equation originally derived by Erbs and his colleagues is found to fit the average standardized curve in this study. The equation is expressed: where temperature is in hours with 1 and 24 corresponding to 1 am and midnight, respectively.
The relation between the amplitude A and the monthly mean clearness index, K t , is calculated as [26]: After the trend components are removed from the ambient temperature values, the variable component of the data is converted into a normal distribution and then tested with many ARMA models. The AR2 model is finally selected:  Table 7.
Statistical parameters of the k t series of Da Nang.
here, χ tÀ1 and χ tÀ2 are the values of the weather data variables at t-1 and t-2, respectively, and Φ 1 and Φ 2 are calculated from the available ambient temperature data and are found to be 0.9072 and À 0.1430, respectively. ε t is a random number from a normal distribution with zero mean and a standard deviation ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 À Φ 1 r 1 À Φ 2 r 2 p : r 1 and r 2 are the corresponding autocorrelation coefficients. The generated χ is transformed to an hourly temperature by equating the cumulative function of χ, F normal , and hourly ambient temperature, F temp . F normal is given by: whereas F temp is calculated as follows: with: (25) in which N m is the number of hours in the months and σ m is the standard deviation of the monthly-average daily temperature T a given by: σ yr is the standard deviation of the yearly average ambient temperature. Solving these equations, hourly ambient tempaerature is given by: (27) Figure 11 shows the schematic diagram of the procedure to generate hourly ambient temperature sequences.
In this figure: L st and L loc are the standard meridian for the local time zone and the longitude of the location considered, respectively.
T a [j] and K[j] are the monthly mean ambient temperature and monthly mean daily radiation of j th month, respectively.
T a,yr is the year average ambient temperature, calculated from the 12 monthly values.
σ m [j] is the monthly standard deviation of the j th month, obtained from the yearly average value and the monthly average temperature for that month.
A[j] is the amplitude of the diurnal variation (peak to peak) of ambient temperature, and is a function of monthly average daily clearness index.
T a,h [avhr] is the hourly monthly-average ambient temperature; the subscript "avhr" indicates the calculated monthly-average hour (avhr = 1 to 24). ε t is a random number from a Gaussian distribution. hr is the hour considered. Nmax is the number of hours in the respective month χ is the normally distributed stochastic variable with a mean of 0 & a variable of 1.

Figure 11.
Schematic diagram of the procedure to generate hourly ambient temperature sequences. Φ 1 and Φ 2 are the first and second parameters of the AR2 model. F normal is the cumulative distribution of a normally distributed variable. Figure 12 shows the cumulative distribution of hourly ambient temperature sequences for Ho Chi Minh City. The figure compares the results using measured data and arificial data generated from the equations described in this section. As shown, the model presented here produced accurate hourly ambient temperature in comparision with measured data.

Validating generated versus measured weather data by running a solar distillation simulation program
The main objective of this study is to build a model to generate weather data, including daily and hourly solar radiation sequences and ambient temperature series; then these weather data chains must be used to run simulation programs. Therefore, the generated weather data is used as input for SOLSTILLa simulation program for solar distillation systems [27]. This simulation program was designed to enable to simulate both passive solar stills and active solar distillation systems. Figure 13 presents the heat and mass diagrams in a passive solar still whereas Figures 14 and 15 respectively shows the schematic diagram and heat and mass transfer process in a forced circulation solar still.
The inputs for SOLSTILL can be in form of hourly weather data if the measured hourly solar radiation and ambient temperature are available. If not, the weather data generation function in SOLSTILL can be called to generate weather sequences with input as monthly average daily solar radation and ambient temperature values of 12 months [27]. Both modes of weather data in SOLSTILL (i.e., input hourly weather data and generation mode) are used in this study. For Mode 1, hourly measured weather values, achieved from National Center for Hydro-Meteorogical Forecasting [28], are input the program. For Mode 2, the weather data generation function in SOLSTILL does its job. The outputs of SOLSTILL consist of hourly amounts of distillate water, hourly temperatures of the cover, basin water and the basin, etc. In this study, only hourly amounts of distillate water are considered.   Then, daily and monthly average daily amounts of distillate water are achieved. Figures 16 and 17 show the monthly average daily distilled water of a conventional solar still for Ho Chi Minh City and Da Nang whereas Figures 18 and 19 show those of a forced circulation solar still with enhanced water recovery, respectively.
As shown in Figures 16-19, the errors of the predicted monthly average daily distillate productivity of both a conventional passive solar still and a forced circulation solar still with measured and generated weather series as input data are very  small. The largest error is 9.3%, occurred in April in Ho Chi Minh City for a conventional solar still. The errors of predicted yearly average daily distillate productivities are less than 5%. Therefore, it can be expected that the weather data generated from the proposed models can be used to run any simulation programs for solar distillation systems.

Conclusion
In this study, to generate daily clearness index sequences for Ho Chi Minh City and Da Nang, two cities presenting for two climate types in tropical region, Aguiar's model was chosen. Then a modified model of Graham was proposed to generate hourly clearness index sequences from generate daily clearness index series for these  two locations. After that, a model to generate hourly ambient temperature sequences from monthly average daily ambient temperatures was presented. Having been proved by some statistic configurations and the predicted distillate productivities of solar still simulations, the models in this study are accurate in predicting daily and hourly irradiances and ambient temperature sequences. Especially, the model proposed in this study to generate the hourly solar radiation values is much simpler compared with Graham model. Therefore, both solar radiation and ambient temperature generating models in this work are believed to be used to calculate daily and hourly weather data for any numerical simulation programs of solar distillation systems with very limited input parameters, including the latitude, monthly average daily clearness index and ambient temperature values of the investigated locations.