## 1. Introduction

Portions of western Puerto Rico are subject to flash flooding due to sudden, extreme rainfall events, some of which fail to be detected by NEXRAD radar located approximately 120 km away in the town of Cayey, Puerto Rico, and partially obstructed by topographic features. The use of new radars with higher spatial resolution and covering areas missed by the NEXRAD radar is important for flood forecasting efforts and for studying and predicting atmospheric phenomena.

The use of a radar network in Puerto Rico western area will monitor the lower atmosphere where the principal atmospheric phenomena occur. This work represents the first time that TropiNet radar technology will be used for hydrologic analyses and specifically for rainfall forecasting in western Puerto Rico.

Short-term rainfall forecasts have commonly been made using quantitative precipitation forecast (QPF). The introduction of quantitative precipitation forecasting (QPF) in flood warning systems has been recognized to play a fundamental role. QPF is not an easy task, with rainfall being one of the most difficult elements of the hydrological cycle to forecast [1], and great uncertainties still affect the performances of stochastic and deterministic rainfall prediction models [2].

Currently, this capability does not exist in western Puerto Rico, and it is needed because of the potential for flooding in certain areas (e.g., in flood plains near the principal rivers of the region). In this research, short-term rainfall forecast analysis is performed using nonlinear stochastic methods. Once obtained, the rainfall forecast is introduced into a hydrologic/inundation model Vflo and into the Inundation Animator configured for the Mayagüez Bay Drainage Basin (MBDB).

Specific components of the research are: the inclusion of calibration and validation of rainfall estimates produced by the TropiNet radar network and the development and validation of the stochastic rainfall prediction methodology.

## 2. Stochastic modeling and short-term rainfall forecasting

There are many approaches that can be used to predict the future direction and magnitude of a physical process, such as rainfall. Forecasting is a large and varied field having two predominant branches: qualitative forecasting and quantitative forecasting [3]. Quantitative forecasting should satisfy two conditions, the accessible numerical information about the past and assumptions that some aspects of the past patterns will continue into the future. Quantitative forecast can be divided into two classes: time series and explanatory models. Explanatory models assume that the variables to be forecasted exhibit an explanatory relationship with one or more other variables, in contrast, time series forecasting uses only information on the variable to be forecasted and makes no attempt to discover the factors affecting its behavior [3].

A number of researchers have developed hurricane prediction tracking models in Puerto Rico. For example, see Ref. [4] used historical data to develop a stochastic model to predict the behavior of hurricane tracks. The parameter estimation scheme, based on recursive and iterative algorithms, used historical records for hurricanes to fit vector autoregressive models. The identified models have been classified according to the order of the model. The first observations of a given hurricane are compared with historical hurricane tracks. The author [4] concluded that the vector ARMA model has excellent potential and may help reduce official forecasting error compared with a Statistical-Dynamical Hurricane Track Prediction Model (NHC90) from the National Meteorological Center [5]. Introduce a rainfall forecast methodology based on NEXRAD data, which was used as the basis to formulate the new rainfall methodology.

Other examples of rainfall forecasting models were developed and available in the literature. Prediction of Rainfall Amount Inside Storm Events (PRAISE) is a stochastic model developed by [6] to forecast rainfall height at site. PRAISE is based on the assumption that the rainfall height accumulated on a delta time is correlated with a variable that represents antecedent precipitation. The mathematical background is given by a joined probability density function and by a bivariate probability distribution, which is referred to the random variable that represents rainfall in a generic site and antecedent precipitation in the same site. The peculiarity of PRAISE is the availability of the probabilistic distribution of rainfall heights for the forecasting hours, conditioned by the values of observed precipitation.

## 3. Methodology

The University of Puerto Rico at Mayagüez has a research weather radar network and a rain gauge network developed by Dr. Luz Estella Torres-Molina for research work. The radar network provides information with higher spatial and temporal precision. TropiNet (radar) has a 60 × 60 m spatial resolution at every pixel and temporal resolution of 1 min. A flood warning model must be operated based only on the data available at the time of forecast. Only the radar can display data in real time. This is not possible using rain gauges, but the rain gauges are used as data validation. Rain gauges-based systems must have a dependable and redundant telemetry system that will accurately and efficiently transmit data to a central location for processing. The data from TropiNet radar are used for rainfall prediction in the watershed, using stochastic methods. Once the rainfall forecast is obtained, the use of hydrologic models is necessary for analysis of flooding in this area. This research is focused on the western Puerto Rico and could be applied in general to other areas or regions with the same rainfall type with the corresponding hydrologic soil and coverage data.

The study area, which encompasses the watershed, is 819.1 km^{2} in area [7] and is located in western Puerto Rico. The region has three important watersheds: Río Grande de Añasco, Río Guanajibo, and Río Yagüez. The area includes 12 municipalities: Mayagüez, Añasco, Las Marías, San Sebastián, Lares, Maricao, Yauco, Adjuntas, Sabana Grande, San Germán, Hormigueros, and part of Cabo Rojo.

Flood problems in this study area are serious and widespread. Periodic flood damage to pastureland, roads, and a number of residential areas is significant. Flood waters have inundated the main Río Grande de Añasco floodplain 17 times in a period of 31 years, an average of approximately once every 2 years. The floodplain of the lower Rio Grande de Añasco has been inundated extensively at least six times during the period 1899–1975. One of the greatest known floods occurred in September 1975. Another great flood was that of September 1928. Other major floods occurred in September 1932, September 1952, October 1970, August 1899, and September 1899 [8].

The soil textures present in this study as percent of area are clay with 62.49%, clay-loam 24.96%, rock 8.69%, loam 3.00%, sand 0.81%, and gravel 0.04%. A soil map describing the class distribution is necessary to assign the values of the Green-Ampt infiltration parameters (see **Figure 1**).

Developed an algorithm of Water and Energy Balance for Puerto Rico using data from the geostationary operational environmental satellite (GOES) [9]. GOES-PRWEB utilizes an energy balance approach similar to [10]. The latent heat flux component of the algorithm is used to estimate actual evapotranspiration. The algorithm depends on solar radiation, which is determined using GOES satellite data. Ref. [11] were the first to propose a physical model for estimating the incident solar radiation at the surface from the Geostationary Operation Environmental Satellite.

Ref. [9] provided solar radiation data with spatial resolution of 1 km for Puerto Rico. In this research, we developed a subroutine in MatLab to convert the original 1 km resolution to 200 m resolution to obtain potential evapotranspiration (PET) estimation in a resolution compatible with our hydrologic model.

Twenty different classes of land cover and forest type are present over the study area corresponding to different kind of forest, woodland, and agriculture. The classification of land cover in this model is used to assign values for physical parameters which are important in the simulation with the hydrological model *Vflo*, and other important parameters with the land use are Manning’s roughness coefficient, rainfall interception, evapotranspiration, crop coefficient, and other. Ref. [11] classified the land use for this watershed in six major categories, shrub land, woodland, and shade coffee with an area of 529.16 km^{2}, pastures with 172.84 km^{2} of area, urban and barren area with 60.02 km^{2}, agriculture with 55.06 km^{2}, other emergent wetlands with 1.26 km^{2}, and quarries, sand, and rock with 0.75 km^{2}.

The climate of the study area is considered humid subtropical. The average temperature at the Mayagüez City, Puerto Rico station, is 70.7°F between the years 1971 and 2000, and the average maximum temperature in the Mayagüez city station between the years 1971 and 2000 is 88.7°F [12].

The amount of rainfall varies considerably throughout the study area. Most of the rainfall occurs during the month of September with 10.62 inches on average. The months of January through April are considered the dry season with 1.60 inches in January, 2.59 inches in February, 3.35 inches in March, and 4.17 inches in April on average rainfall. In the west, the sea breeze effect carries wet air from the Mona Channel eastward, converging with the Trade Wind and resulting in intense convective rainstorms almost every afternoon within the watershed during the wet season. Rainfall and temperature data were obtained from the National Climatic data Center [12].

**Table 1** includes the dates and specifications of every storm to the current study.

## 4. Radars

Radars are active sensors that emit electromagnetic pulses into the surroundings. A typical radar system consists of at least the following four components: a transmitter that generates high frequency signals, an antenna that sends the signal out and receives the echoes returned, a receiver that processes the returned signals, and a data display system [13]. Lower frequency and higher wavelength suggest that the radar has robust signal power and less attenuation, and the weather radar system discussed in the current research is based in X-band.

The TropiNet radars are Doppler polarimetric radars which allow the radar beam to measure reflectivity close to the ground, overcoming the shadow effect of the Earth’s curvature, while maintaining high range and azimuth. The first TropiNet radar has been in operation since February 2012. TropiNet 1 is located in “Cerro Cornelia” Cabo Rojo, Puerto Rico, at 18.16°N, 67.17°W, and 200 ft elevation (msl), approximately. The radars, working with the X-band frequency, are about three times stronger than that of the traditional radar frequencies at S-band, making the measurements of rainfall more attractive. They have high space and time resolution for weather monitoring and detection and are capable of generating very high resolution data with a range of 40 km of radius or maximum radial distance (horizontal range) of 80 km of diameter.

Ref. [14] found a series of radars used for hydrological modeling. Additionally, they have indicated that in convective precipitation, when very steep horizontal gradients are observed, the information from a single rain gauge can be misleading. It must be stressed that radar and rain gauges are not competitive to each other because radars are used for spatial and temporal measurement, while the rain gauge only measures rainfall at a given single location [14]. If the comparison of a storm total is necessary, [15] explain that storm totals may be more accurately estimated by radar than any particular hourly accumulation when compared to rain gauges.

TropiNet radar being Doppler and Polarimetric can show velocity data of the cloud and reflectivity for every azimuth angle from 0 to 12°. TropiNet displays reflectivity logarithmically (10 log(*Z*)) or dBZ. The working frequency is 9.41 GHz ± 30 MHz, which corresponds to the X-band (in free space has a 3.19 cm wavelength). The TropiNet radar was designed and developed by Colorado State University (CSU) and UPRM to serve as the principal Internet-controllable node of the TropiNet radar network [16].

To analyze the data, it was necessary to develop a model to convert raw data into NetCDF data and then convert the reflectivity data in dBZ to rain rate in (mm/hr) using empirically derived *Z-R* relationships to transform reflectivity to rain rate. Ref. [17] equation is the default *Z/R* relationship employed by the WSR-88D and TropiNet.

NOAA-NWS [1995] report recommended that *Z-R* relationship in use at the time of the event be changed from *Z* = 300*R*^{1.4} to a relationship more representative of raindrop distributions in a warm tropical storm. The *Z-R* relationship for warm tropical events recommended by the NWS Operational Support Facility since 1995 for all WSR-88D sites experiencing heavy rainfalls and now adopted by TropiNet is *Z* = 250*R*^{1.2} [15].

The *Z-R* relationship used in Puerto Rico is the convective; furthermore, it was necessary to define a maximum precipitation rate threshold for decibels above 53 dBZ [15]. The convective rainfall is a type of precipitation with some characteristics like very high horizontal gradient and very large vertical depths. These characteristics mean that the weather radar is the best tool for detecting convective precipitation, but the presence of different types of hydrometeors, especially hail and storm dynamics yielding fast varying vertical profile reflectivity (VPR), usually results in considerable random error in quantitative precipitation estimates. Large differences can be found especially when comparing rain gauges and radar estimates because of the high temporal and spatial variability of the convective storm and related vertical profile of reflectivity [14].

A radar application in MatLab was developed to access the store of binary volume files that contain the respective information as determined by the operator like reflectivity, azimuth, velocity, beam width, range, elevation, and other radar products. The operator can apply one of several possible scan configurations. For instance, in the range height indicator (RHI), the radar holds its azimuth angle constant but varies its elevation angles. This is essential to provide vertical resolution where the radar continuously scans through elevation angles at a given azimuth angle. Another common scan configuration is the plan position indicator (PPI), where the radar holds its elevation angle constant but varies its azimuth angle, rotating through 360°.

For this research, it was necessary to hold the radar scan in PPI with a constant elevation angle of 3°. Every radar scan has two angles of 3° and 5° with a duration time of 30 seconds. The data information is saved in the server. The raw data files are stored by date every hour, minute, and second of scan in binary format. Each volume scan from radar has been interpolated to a fixed Polar grid and, after it is necessary, to convert to the fixed Cartesian grid. As part of the effort to further post-process the radar data, a model in MatLab was developed. This model performs the conversion from raw data polar coordinate system into ASCII data in Geographic coordinate system necessary for the hydrological software, *Vflo*.

## 5. Hydrologic model

The hydrologic model used in this research is *Vflo* [18]. *Vflo* is a fully distributed physically based hydrologic (PBD) model capable of utilizing geographic information and multi-sensory input to simulate rainfall runoff from major river basins to small catchments.

*Vflo* is a hydraulic approach to hydrologic analysis and prediction. Overland flow and channels are simulated using the Kinematic Wave Analogy (KWA). The model utilizes GIS grids to represent the spatial variability of factor controlling runoff. Runoff production is from infiltration excess and is routed downstream using Kinematic Wave Analogy. Computational efficiency of the fully distributed physics-based model is achieved using finite elements in space and finite difference in time. *Vflo* is suited for distributed hydrologic forecasting in post-analysis and in a continuous operation mode, derives its parameters from soil properties, land use, and topography, and in this case, the precipitation is obtained from radar TropiNet. The goal of distributed modeling is to better represent the spatial-temporal characteristics of a watershed governing the transformation of rainfall into runoff.

The hallmark of *Vflo* is prediction of flow rates and stages for every grid cell in a catchment, watershed, river basin, or region. *Vflo* provides high-resolution, physics-based distributed hydrologic modeling for managing water from catchment to river basin scale. Improved hydrologic modeling capitalizes on access to high-resolution quantitative precipitation estimates from model forecasts, radar, satellite, rain gauges, or combinations of multisensor products.

Model input consists of rain-rate maps at any time interval from radar or multisensor sources. Data input for this model (besides rainfall) is derived from various commonly available sources of digital data. Parameters include topography and drainage networks derived from a digital elevation model (DEM), infiltration derived from soils, and hydraulic roughness derived from land use/cover. These parameters may be input and edited manually or via ArcView grids.

For calibration performance, there is a sequence called the “ordered physics based parameters adjustment” (OPPA) method developed by [19]. The calibration process (OPPA) approach includes estimates of the spatially distributed parameters from physical properties, assigns channel hydraulic properties based on measured cross-sections where available, studies model sensitivity for the particular watershed, and identifies response sensitivity to each parameter. It furthermore runs the model for a range of storm from small, medium to large events. It observes the characteristics of the hydrograph over the range of storm size and any consistent volume bias. Then, it derives a range of response for a given change in a parameter and categorizes and ranks parameter sensitivity according to response magnitude.

The optimum parameter is that set which minimizes the respective objective function and matches volume by adjusting hydraulic conductivity. It can match the peak by adjusting overland flow roughness and re-adjust hydraulic conductivity and hydraulic roughness if necessary. The *Vflo* model does not simulate base flow, only direct runoff; it can be simulated assigning a fixed value to every channel cell for every event to simulate.

For a long-term analysis, it is necessary to quantify the base flow using known methodologies [20]. The OPPA procedure outlined above can be stated as: increasing the volume of the hydrograph is achieved by decreasing hydraulic conductivity, and similarly, increasing peak flow is achieved by decreasing hydraulic roughness.

*Vflo* model uses finite elements which can simulate streamflow based on geospatial data to simulate interior locations in the drainage network and determine channel flow and overland flow. It was fundamental to study the physical configuration of the watershed, such as a digital elevation model (DEM), the digitized topography, soils map, land use map, and information about the basin.

Some stations from the USGS were used to compare and validate the runoff with the results from the hydrological model using radar data. Ref. [8] implemented the most recent Flood Insurance Study (FIS) for the Commonwealth of Puerto Rico. Standard hydrologic and hydraulic study methods were used to determine the flood hazard data required for this countywide FIS. The flood events have magnitude of exceeding once at any given day during the recurrence period of 10, 50, 100, and 500 years. These events have a percent chance of 10, 2, 1, and 0.2%, respectively. The equation employed was mean annual rainfall (MAR) obtained from mean annual precipitation (MAP) developed by NOAA in 2006 [precipitation record 1971–2000]. The regression analysis was performed based on depth to rock (DR) and contributing drainage area (CDA) as variables that govern the peak streamflow.

Other product in this research was potential evapotranspiration. A GOES satellite-based potential evapotranspiration (*PET)* product, with resolution of 1 km over the entire island each day, was used in this research. One of the most used methods to calculate PET or reference evapotranspiration, and the method used in this study, is the FAO Penman-Monteith method. A large number of empirical methods have been developed over the last 50 years, and the Penman-Monteith method was considered to offer the best result with minimum possible error. The Penman-Monteith reference evapotranspiration equation is given by

where *ETo* is reference evapotranspiration (mm day^{−1}), *R*_{n} is net radiation at the crop surface (MJm^{−2}day^{−1}), *G* is soil heat flux density (MJm^{−2}day^{−1}), T is mean daily air temperature at 2 m height (°C), *u*_{2} is wind speed at 2 m height (ms^{−1}), *e*_{s} is saturation vapor pressure (kPa), *e*_{a} is actual vapor pressure (kPa), *e*_{s} – *e*_{a} is saturation vapor pressure deficit (kPa), Δ is slope vapor pressure curve (kPa°C^{−1}), and γ is psychrometric constant (kPa°C^{−1}). The equation uses standard climatological records of solar radiation, air temperature (°C), humidity, and wind speed (ms^{−1}). The weather measurement should be made at 2 m (or converted to that height) above an extensive surface of an hypothetical green grass with an assumed height of 0.12 m, a fixed surface resistance of 70 sec m^{−1}, and an albedo of 0.23.

Also a slope map was developed using the digital elevation map (DEM) at 200 and 10 m resolution from USGS. The digital elevation model (DEM) data consist of a sampled array of regularly spaced elevation values referenced horizontally either to a Universal Transverse Mercator (UTM) projection or to a geographic coordinate system. The grid cells are spaced at regular intervals along south to north profiles which are ordered from west to east. **Figure 2** presents the slope map for the basin derived from the DEM at 200 m resolution.

The watershed presents the six basic textures into the basin area, where a large amount area of clay is observed. The soil names present into the clay area are: Alluvial land, Aguilita, Aibonito, Bajura, Consumo, Daguey, Delicias, Humatas, Lares, Jacana, Los Guineos, Malay, Mabi, Mariana, Mariaco, Montegrande, Mucara, Nipe, and other. For the clay-loam texture, the soil names present are Anones, Caguabo, Descalabrado, and Morado. For the loam texture, the soils are Coloso, Corcega, Dique, Guainabo, Mani, Maresua, Palmarejo, Reilly, Talante, Toa, and other. Soils that correspond to the rock texture are including Limestone, Serpentine, and Volcanic rock land, for the sand texture was found in the soils: Cataño, Leveled and River wash and the last texture is Gravel which only has one soil with the same texture name (**Figure 3**).

**Figure 4** presents the percent's of textures into the watershed, in which the clay encompasses most of the study area with 63% of total area.

On the other hand, the minimum texture present in the basin is the Gravel with a value approximate to 0.02%.

The hydrologic group is a parameter that affects the infiltration and runoff. **Figure 5** presents the basin area with the hydrologic groups A, B, C, and D. The most representative groups are C and D: C with a 32% total area and D with a 40% total area.

Other parameters such as hydraulic conductivity, wetting front, and effective porosity were assigned from literature [20, 21]. The hydraulic conductivity (*K*) may especially control the infiltration process when rainfall occurs over already saturated soil; the hydraulic conductivity was specified for a single layer soil profile for this study area.

The wetting front is the average capillary potential of the Green-Ampt infiltration routine, and this parameter is important because it can calculate infiltration under unsaturated conditions and its value is independent of soil moisture at any particular time. The effective porosity is the difference between total porosity and residual soil moisture content. This property is independent of soil moisture at any time, its range is between 1 and 0, with complete porosity being a value of one (1), and the value zero (0) is for the zero porosity. The soil depth is the depth to which the infiltration can occur in the soil, and if the wetting front is obstructed by a perched water table, then the depth to the water table is the limiting depth. If the soil profile is limited by an impermeable layer, then the depth to that layer is the limiting depth. Soil depth can be modified through the calibration of simulations to observed streamflow [21]. The soil depth data were obtained using USDA [22–25].

## 6. Types of forecasts

There are some properties needed to distinguish between different types of forecast. Forecast can extend to different scales in space and time; the spatial is doing reference in a fixed location in a specific area of city, e.g., the precipitation on a grid from TropiNet radar over Mayagüez city. The temporal range of a forecast is furthermore called lead-time. Short-range forecast covers very close events, like the next few hours or next minutes as in our case, and the long-range forecast considers the mean value of a meteorological parameter over a few days or months.

In this research, the data are correlated in space and time, where the strength in general decreases with spatial and temporal distance. Our models are designed to do forecast in time and space. This increases the difficulty as compared with prediction models that only use the forecast in time at a given place (e.g., forecast in rain gauges).

Other types of forecast are deterministic. In this case, a single forecast value is issued at each occasion, pretending a confidence that hides the forecaster’s uncertainty about the outcome. They are easy to interpret even for user without stochastic background knowledge. The simplest case is a deterministic binary forecast. This area decision, like yes or no, and additionally a generalization in the forecast if necessary, distinguishes between types of variables to be forecasted. The variable of interest can be ordinal, which can be expressed by a number and can be defined by an appropriate number of threshold values (e.g., light rain, middle rain, or heavy rain).

Other variable of interest is the nominal, where there is no natural ordering, like qualitative observation of the kind of precipitation (e.g., snow, rain, ice, or other). A deterministic evaluation is furthermore named quantitative precipitation forecast (QPF), which induces the user to suppress information and judgment about uncertainty. In fact, it may create the illusion of certainty, while a probabilistic forecast is indicated as probabilistic quantitative precipitation forecast (PQPF). In order to reflect the uncertainty of the future outcome, probabilistic statements are more appropriate.

For this research, a methodology that embraces a space-time stochastic model is used and is considered a “discrete time-series model” that includes a special kind of nonlinear model with stochastic and deterministic components. Here, the rainfall process is described at a discrete time steps, is not intermittent, and, therefore, can be applied for describing the forecast within storm rainfall.

The other case is the use of meteorological model. These are useful qualitative and quantitative rainfall forecasting tools on 24–72-h interval and on a large spatial scale. In such cases, indeed absolute precision is not required for practical application. In meteorological models when the forecasting lag time and spatial scale decrease, the effectiveness and the precision of kind of model additionally decrease [26].

Autoregressive-moving-average models (ARMA) are mathematical models of autocorrelation in a time series. ARMA models are widely used in hydrology and were popularized by [27] who elaborated a comprehensive theoretical and practical development of time series models. There are several possible reasons for fitting ARMA models to data. ARMA modeling can contribute to understanding the physical system by revealing something about the physical process that builds persistence into the series. ARMA models can additionally be used to predict behavior of a time series from past values alone. Such a prediction can be used as a baseline to evaluate possible importance of other variables to the system.

The model consists of two parts: an autoregressive (AR) part and a moving average (MA) part. The AR model expresses a time series as a linear function of its past values. The order of the AR model indicates how many lagged values are included. The moving average (MA) model is a form of ARMA model in which the time series is regarded as a moving average of a random shock. The model is usually then referred to as the ARMA(*p,q*) model where *p* is the order of the autoregressive part and *q* is the order of the moving average part. ARMA models in general, after choosing *p* and *q*, are fitted by iterative procedure of a nonlinear least squares regression to find the values of the parameters which minimize the error term. The ARMA modeling process is commonly an iterative, trial, and error process. Thus, it is necessary to use the least possible number of parameters that will adequately produce forecasted values with similar statics of the historical data [28].

ARMA is a methodology widely used to do predictions of all types, for economy as well as for the weather predictions. In any case, it is necessary to have a long historical data. In the literature, ARMA model has been used to predict at one or two rain gauges at a single point but not at radar field. Since the ARMA model predicts at a single point, this is an important reason to avoid the use of ARMA methods in this research. This principle was applied to this thesis or this model and at the same time to the principle of parsimony to obtain results in the model with small possible error.

The Point process is a type of random process for which any action consists of a set of isolated points in time or in space. The example more global in point process model is the Poisson process that counts the number of events (storm) and the time that these events occur in a given time interval. Usually, the time between each event’s development has an exponential distribution and the numbers of occurrences are independent of each event (storm).

The Point process model has been used commonly to forecast rainfall in which storm origins occur in a Poisson process. The Point process model is applied at a single site or fixed point where the storms arrive in a Poisson process. Each storm incorporates a group of random number of rain cell, where each cell has a random duration or lifetime and depth. The total rate of precipitation at time (*t*) is the sum of contributions from all active cells at (*t*) [29]. This type of model uses complex equations, and the analysis of precipitation is in time at a fixed point in space and the properties of the natural process can be deduced via the mathematical model.

Ref. [30] have a model for daily rainfall in which wet and dry days occur in a Markov chain with seasonally dependent transition probabilities. In it, the amounts of rain per wet day have a gamma distribution with seasonally dependent parameters.

An algorithm for predicting 10, 20, and 30 min in advance the spatial distribution of rainfall rate is introduced in this work. The algorithm is based on the assumption that TropiNet radar rainfall rate data provide estimations of the rainfall with high spatial and temporal resolution. Some researchers have compared radar rainfall data with rain gauge measurements [31–33]. These comparisons may not be useful since a rain gauge measures precipitation at a single point located at the surface level, whereas the weather radar measures the average of reflectivity at certain elevation and over a much larger area. A stochastic function is used to estimate the rainfall rate based on reflectivity.

When a rain gauge is compared with radar, it is expected that the average rainfall will behave as an individual point. It is known that the average will behave differently than that of an individual observation; therefore, these quantities should not be expected to be equal. When several rain gauges are averaged and compared with the radar measurements, the average of the rain gauges is inconsistent because it was developed with few points, whereas the average of the radar was developed with a much larger number of points. The rainfall modeled over a watershed shows that the peak flow measurements and overall runoff from radar performed better than the estimated peak flow using rain gauges [34]. Additional studies have concluded that the peak discharge of streamflow computed with radar data was more accurate than those computed from rain gauges alone [35]. Thus, there is no instrument that precisely measures the amount of rainfall over a large area. The weather radar provides an estimation of the rainfall rate over larger areas.

The suggested algorithm uses TropiNet (RXM-25) data to predict the variability of the rainfall field in time and space. It is assumed that for a short time period (10, 20, and 30 min), a rain cloud behave as a rigid object, with all pixels moving in the same direction at a constant speed. Thus, the most likely future rainfall areas are estimated by tracking rain cell centroid advection in consecutive radar images. The suggested algorithm is a special kind of nonlinear model with stochastic and deterministic components. The rainfall process exhibits significant changes in time and space, and it can be characterized as a nonstationary stochastic process. To face the nonstationary characteristic of the process, parameters are estimated at every time and spatial domain.

The model consists in considering the rainfall shape data as a rectangular grid with 940 columns and 740 rows of pixels for a total of 695,600 pixels, with every pixel size is 0.06 km wide and 0.06 km long. From the grid data, select a zone of 81 pixels that was divided in squares of Δ*x* × Δ*y* pixels, where (Δ*x*) is referenced to columns of 9 pixels and (Δ*y*) to rows of 9 pixels with total zones of 8,528 (82 x 104) in every window, as shown in **Figure 6**. Several zone sizes were explored for Δ*x* and Δ*y* = {7, 9, 11, …, 25}, and it was found that the larger the zone size, the larger the number of degree of freedom. However, resolution was degraded with increased zone size.

In the model, the use of the same zone in the previous window (*t* − 1) and (*t* − 2) is necessary. Every zone (9 × 9) should have a minimum of 24 rain pixels with 20 degrees of freedom. Zones with less pixel of rain could not be selected to forecast analysis. In zones where the prediction movement suggests that there is a rainfall cell but the zone has no necessary pixels required (24 pixels), an interpolation was applied. The interpolation used was “*Kriging*” between the 25 pixels nearest to pixel that has no prognostic.

The model is defined by the equation

where (*i, j*) represents the geographic position or coordinates, latitude, and longitude of every pixel in the grid, and *k* is the zone. This process starts in pixel 1 until pixel 8,528. In every zone, unknown parameters should be determined (*α, β, Φ, δ*1, *δ*2, *δ*3): *α* is the minimum value found between previous values of *h*_{t − 1,k(i,j)} and *h*_{t − 2,k(i,j)} in their respective zones, and (*k*), *β* is the reflectivity maximum value found between previous values of *h*_{t − 1,k(i,j)} and *h*_{t − 2,k(i,j)} in the specific zone (*k*). The mathematical structure of the model was inspired on a previous work [36]. In the current work, this model was used because this scheme ensured that rainfall forecast will fall inside of the most likely rainfall intensity domain [*α, β*], which was derived by the observed local rainfall distribution.

*t* − 1). The average value was determined in every pixel into each zone. It was obtained averaging the eight pixels closest to the pixel under study. Similarly, *t* − 2).

The variable *Z*_{t − 1,k(i,j)} is the ratio between the pixels with maximum reflectivity. *Z*_{max(t − 1), k(i,j)} in every cloud or cell and the nearby pixels *Z*_{i(t − 1), k(i,j)} forming the cloud or cell, and the random variable *ε*_{t,k(i,j)} is a sequence of an unobserved random variable with mean zero and constant variance associated with the pixel (*i, j*)

The variable Phi (Φ_{t,k}) is changing in the equation at every zone (9 × 9) in each window. This variable was determined first by linearization of the nonlinear equation (Phi-initial) and after using optimization nonlinear techniques with constrains “*Sequential Quadratic Programming*” (SQP). The initial coefficient deltas (*δ*1, *δ*2 *and δ*3) were obtained through the estimation method “*least squares*” by linearization of nonlinear equation (exponential). Once the variable’s initial deltas were found, the next step is to find the variable Phi (Φ_{t,k}) initial. These values were used to forecast rainfall at one lead-time and successively with the following forecasts.

## 7. Results

There are many methods for forecasting with long lead-times as 8, 24, and 36 hr. or weekly, using autoregressive methods, moving averages, and others. However, the current study is a special kind of method for nowcasting (short time as minutes). In the western Puerto Rico occur sudden precipitations with short durations due to atmospheric conditions and topographic features at a given locations. Precipitation events may develop, occur, and dissipate immediately and the duration will be about 1, 2, or 3 hr.

Knowing the precipitation characteristics, the nowcasting model developed in the current research only needs two lag times for prediction. This means that the model has the capacity to forecast the rainfall even if the duration is very short. The developed model is presenting the best prediction when the lead-time is 10 min. The postulated rainfall nowcasting algorithm involves two major tasks: a) predicting the future location of the rain pixels and b) predicting rainfall at each pixel.

**Figure 7** shows the cloud motion comparison between observed (right) movement and estimated (left) movement at storm date (March 28, 2012; 17:10 hr.).

The point black is the centroid at initial time, and the red point is the centroid at the final time. In some cases, there is more than one clouds centroid, and so these are more than one black and red points in the figure. This happens when the division cloud method has detected more than one cloud system within the area.

For all events, the best results were presented with a prediction of 10 min (see **Figure 8**). The western Puerto Rico area geographical position makes it susceptible to sudden rainfalls that are changing rapidly in time and space. Due to this change, a lead-time of 10 min is the time prediction more adequate to this precipitation class. A larger lead-time results in greater statistical errors. Contrarily, using a lead-time smaller than 10 min, the purpose of flood alert system will be annulled by the absence of time to evacuation.

It is important to mention that the algorithm to forecast precipitation uses a sequence of the observed rainfall data to estimate the movement direction and size of the cloud or cell, and then using the main equation, rainfall is estimated in each pixel within every zone. Thereby, the suggested regression model was developed under the following assumption: It is expected that in a short time (10 min) period, a rain cloud behaves approximately as a rigid object and the cloud rain pixels move in a constant speed and direction. Thus, the most likely future rainfall areas can be estimated by using the advection of the centroids of the rain cells in consecutive images. The current estimation reflectivity is a function of the previous reflectivity images observed. Rainfall nowcasting algorithm task is predicting rainfall rate at each pixel.

This methodology was applied to four unknown parameters (*δ*1, *δ*2, *δ*3 *and* Φ) to find the optimum values with a bounded constraint: first, linearize the main equation; second, identify the initial point through a nonlinear regression model where the phi (Φ) is temporarily ignored and the delta values initially are obtained by solving the linear regression; and third, find the optimum values using a constrained nonlinear optimization technique to estimate the final parameter set for each zone (9 × 9) and every window where the phi (Φ) parameter is a bias correction factor introduced in the optimization. The optimum parameters for the nonlinear regression model were estimated by solving a constrained nonlinear optimization problem (*fmincon*). The derived initial point was ingested into the constrained nonlinear subroutine to facilitate convergence, the delta parameters were restricted to be positives, and phi parameter was restricted to be in the range of 0–1.1 values.

An analysis for the nowcasting requires a combination of meteorological and hydrological statistics, which permits a better understanding of behavior of the spatial and temporal accuracy of storm prediction. A good nowcasting includes accuracy of the spatial and the temporal levels and accuracy of the predicted rainfall intensity. Model performance criteria for the prediction required quantitative comparison measures; these measures include ten storms.

The accuracy of rainfall prediction of each pixel can be measured by decomposing the rainfall process into sequences of discrete and continuous random variables, i.e., the presence or absence of rainfall events and rainfall intensity. Examples of quantitative measures used in the current research include Contingency table, mean square error (MSE), root mean square error (RMSE), bias ratio (BR), and mean absolute error (MAE). These measures will be discussed in detail below.

The joint distribution of the forecast and observations has fundamental interest with respect to the verification of forecasts. In the most practical setting, both the forecast and observations are discrete variables, even if the forecasts and observations are not already discrete quantities.

For lead-times of 10, 20, and 30 min, the storms provide an average hit rate (*HR*) of 0.90, 0.86, and 0.84, respectively. The hit rate score is the fraction of observed events that is forecast correctly. It ranges from 0 at the poor end to 1 at the good end.

The probability of detection (*POD*) of storms varies from 0.61, 0.50, and 0.41, while the false alarm rates (*FAR*) are 0.27, 0.38, and 0.46 for lead-times of 10, 20, and 30 min, respectively.

**Figure 9** shows *POD* values and *FAR* values for the complete set of storms. In the ideal situation, *POD* should approach to 1, while the *FAR* results should approach to 0.

The performance indices of 0.74, 0.66, and 0.60 for 10, 20, and 30 min, respectively, for the model, are shown in **Figure 10**.

Similarly, the hit rates (*HR*) of the model for the all storms were 0.90, 0.86, and 0.84 for the 10, 20, and 30 min. The root mean square error (*RMSE*) and bias ratio (*BR*) measure the accuracy of the simulation for all 10 studied events, which furthermore shows the corresponding average values for each lead-times of 10, 20, and 30 min, respectively. The *RMSE* average values are 0.026, 0.077, and 0.144 mm, and the Bias average values are 0.97, 0.98, and 1.04 for lead-times of 10, 20, and 30 min, respectively.

The estimation of bias ratio for a lead-time of 30 min presents an average overestimation prediction, while the estimation of bias ratio for a lead-times of 10 and 20 min shows sub-estimation.

The forecast results present the same tendency of the observed data where the peaks with more precipitation in TropiNet events are coinciding with the forecasted data. They are in good agreement considering that the prediction is in short time and space.

The hydrological model *Vflo* required the ensemble of various layers that perform the physical and topographic characteristics of the basin area. These layers are formed by parameters that were previously presented as effective porosity, hydraulic conductivity, wetting front, roughness, soil depth, and initial saturation, which can be most sensitive in the watershed. Spatially distributed parameter and input from radar rainfall require new methods for adjustment in order to minimize differences between simulated and observed hydrographs. The hydraulic roughness (*n*), hydraulic conductivity (*K*), and initial saturation (*θ*) are the most sensitive parameters of the hydrological model. These values are estimated from physical properties of the watershed adjusted to reproduce system behavior [19]. The hydraulic conductivity controls the total amount of water that will be split into the surface runoff. The hydraulic roughness affects the peak flow, and the time to peak and initial saturation is related to the existing humidity into the soil. Study of model sensitivity was done for the watershed to identify response sensitivity for peak flow to each storm changing the multiplicative factor in the parameters. When the adjustment factor is above 1, the range of change in peak flow falls and tends to remain constant or with a minimum change in the peak flow, just below the referenced value. The analysis suggests that the initial saturation is the parameter with the highest sensitivity in the peak flow for different storms with short duration. Initial saturation is a parameter that depends on how many storms have occurred previously to the studied storm (antecedent soil moisture). Different results are possible to obtain with a sample of continuous storms.

Similar results were founded in peak flow with variations of roughness and hydraulic conductivity for all events. Low variations were founded in peak flow when the adjustment factor takes values greater than one.

## 8. Conclusions

The best statistical results were found in the rainfall nowcasting model with a lead-time of 10 min as expected. It is well known that prediction of sudden storms using rainfall nowcasting models represents the category that is the most difficult to predict, and consequently, providing accurate flash flood warnings from these types of storms is a major challenge.

The nowcasting model has a limitation in the timeshift because we are assuming that the cloud is a rigid object and that the cloud speed is constant, when in reality these parameters could vary. To find the actual weather conditions, more atmospheric parameters would need to be taken into account. In fact, cloud speed depends on its formation and other physical parameters that are constantly changing [37]. These factors should be taken into account in future works.