pRecipe was conceived back in 2020 as part of MRVG’s
doctoral dissertation at the Faculty of Environmental Sciences, Czech
University of Life Sciences Prague, Czechia. Designed with reproducible
science in mind, pRecipe facilitates the download,
exploration, visualization, and analysis of multiple
precipitation data products across various spatiotemporal scales.
 ~The Global Water Cycle Budget | Vargas Godoy et al. (2021)
“Like civilization and technology, our understanding of the global water cycle has been continuously evolving, and we have adapted our quantification methods to better exploit new technological resources. The accurate quantification of global water fluxes and storage is crucial in studying the global water cycle.”
Like many other R packages, pRecipe has some system
requirements:
pRecipe database hosts 27 different precipitation data
sets; seven gauge-based, eight satellite-based, seven reanalysis, and
five hydrological model precipitation products. Their native
specifications, as well as links to their providers, and their
respective references are detailed in the following subsections. We have
already homogenized, compacted to a single file, and stored them in a Zenodo repository
under the following naming convention:
<data set>_<variable>_<units>_<coverage>_<start date>_<end date>_<resolution>_<time step>.nc
The pRecipe data collection was homogenized to these
specifications:
<variable> = total precipitation (tp)<units> = millimeters (mm)<resolution> = 0.25°<time step> = monthlyE.g., GPCP v2.3 (Adler et al. 2018) would be:
gpcp_tp_mm_global_197901_202205_025_monthly.nc
| Data Set | Spatial Resolution | Global | Land | Ocean | Temporal Resolution | Record Length | Get Data | Reference | 
|---|---|---|---|---|---|---|---|---|
| CPC-Global | 0.5° | x | Daily | 1979/01-2022/08 | Download | P. Xie, Chen, and Shi (2010) | ||
| CRU TS v4.06 | 0.5° | x | Monthly | 1901/01-2021/12 | Download | Harris et al. (2020) | ||
| EM-EARTH | 0.1° | x | Daily | 1950/01-2019/12 | Download | Tang, Clark, and Papalexiou (2022) | ||
| GHCN v2 | 5° | x | Monthly | 1900/01-2015/05 | Download | Peterson and Vose (1997) | ||
| GPCC v2020 | 0.25° | x | Monthly | 1891/01-2022/08 | Download | Schneider et al. (2011) | ||
| PREC/L | 0.5° | x | Monthly | 1948/01-2022/08 | Download | Chen et al. (2002) | ||
| UDel v5.01 | 0.5° | x | Monthly | 1901/01-2017/12 | Download | Willmott and Matsuura (2001) | 
| Data Set | Spatial Resolution | Global | Land | Ocean | Temporal Resolution | Record Length | Get Data | Reference | 
|---|---|---|---|---|---|---|---|---|
| CHIRPS v2.0 | 0.05° | 50°SN | Monthly | 1981/01-2022/07 | Download | Funk et al. (2015) | ||
| CMAP | 2.5° | x | x | x | Monthly | 1979/01-2022/07 | Download | Pingping Xie and Arkin (1997) | 
| CMORPH | 0.25° | 60°SN | 60°SN | 60°SN | Daily | 1998/01-2021/12 | Download | Joyce et al. (2004) | 
| GPCP v2.3 | 0.5° | x | x | x | Monthly | 1979/01-2022/05 | Download | Adler et al. (2018) | 
| GPM IMERGM v06 | 0.1° | x | x | x | Monthly | 2000/06-2020/12 | Download | G. J. Huffman et al. (2019) | 
| MSWEP v2.8 | 0.1° | x | x | x | Monthly | 1979/02-2022/06 | Download | Beck et al. (2019) | 
| PERSIANN-CDR | 0.25° | 60°SN | 60°SN | 60°SN | Monthly | 1983/01-2022/06 | Download | Ashouri et al. (2015) | 
| TRMM 3B43 v7 | 0.25° | 50°SN | 50°SN | 50°SN | Monthly | 1998/01-2019/12 | Download | George J. Huffman et al. (2010) | 
| Data Set | Spatial Resolution | Global | Land | Ocean | Temporal Resolution | Record Length | Get Data | Reference | 
|---|---|---|---|---|---|---|---|---|
| 20CR v3 | 1° | x | x | x | Monthly | 1836/01-2015/12 | Download | Slivinski et al. (2019) | 
| ERA-20C | 1.125° | x | x | x | Monthly | 1900/01-2010/12 | Download | Poli et al. (2016) | 
| ERA5 | 0.25° | x | x | x | Monthly | 1959/01-2021/12 | Download | Hersbach et al. (2020) | 
| JRA-55 | 1.25° | x | x | x | Monthly | 1958/01-2021/12 | Download | Kobayashi et al. (2015) | 
| MERRA-2 | 0.5° x 0.625° | x | x | x | Monthly | 1980/01-2023/01 | Download | Gelaro et al. (2017) | 
| NCEP/NCAR R1 | 1.875° | x | x | x | Monthly | 1948/01-2022/08 | Download | Kalnay et al. (1996) | 
| NCEP/DOE R2 | 1.875° | x | x | x | Monthly | 1979/01-2022/08 | Download | Kanamitsu et al. (2002) | 
| Data Set | Spatial Resolution | Global | Land | Ocean | Temporal Resolution | Record Length | Get Data | Reference | 
|---|---|---|---|---|---|---|---|---|
| FLDAS | 0.1° | x | Monthly | 1982/01-2021/12 | Download | McNally et al. (2017) | ||
| GLDAS CLSM v2.0 | 0.25° | x | Daily | 1948/01-2014/12 | Download | Rodell et al. (2004) | ||
| GLDAS NOAH v2.0 | 0.25° | x | Monthly | 1948/01-2014/12 | Download | Rodell et al. (2004) | ||
| GLDAS VIC v2.0 | 1° | x | Monthly | 1948/01-2014/12 | Download | Rodell et al. (2004) | ||
| TerraClimate | 4\(km\) | x | Monthly | 1958/01-2021/12 | Download | Abatzoglou et al. (2018) | 
In this introductory recipe we will first download the GPM-IMERGM data set. We will then subset the downloaded data over Central Europe for the 2001-2010 period, and crop it to the national scale for Czechia. In the next step, we will generate time series for our data sets and conclude with the visualization of our data.
NOTE: While the functions in pRecipe
are intended to work directly with its data inventory. It can handle
most other precipitation data sets in “.nc” format, as well as any other
“.nc” file generated by its functions.
Downloading the entire data collection or only a few data sets is
quite straightforward. You just call the download_data
function, which has four arguments data_name,
destination, domain, and time_res.
Let’s download the GPM-IMERGM data set and inspect its content with
show_info:
download_data(data_name = 'gpm-imerg')
gpm_global <- raster::brick('gpm-imerg_tp_mm_global_200006_202012_025_monthly.nc')
show_info(gpm_global)[1] "class      : RasterBrick "                                         
[2] "dimensions : 720, 1440, 1036800, 247  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : gpm-imerg_tp_mm_global_200006_202012_025_monthly.nc "
[7] "names      : X2000.06.01, X2000.07.01, X2000.08.01, X2000.09.01, X2000.10.01, X2000.11.01, X2000.12.01, X2001.01.01, X2001.02.01, X2001.03.01, X2001.04.01, X2001.05.01, X2001.06.01, X2001.07.01, X2001.08.01, ... "
[8] "Date/time  : 2000-06-01, 2020-12-01 (min, max)"
[9] "varname    : tp " Once we have downloaded our database, we can start processing the data with:
subset_spacetime to subset the data in time and
space.subset_space to subset the data to the region of
interest.subset_time to select the years of interest.mon_to_year to aggregate the data from monthly into
annual.rescale_data to go from the native resolution (0.25°)
to coarser ones (e.g., 0.5°, 1°, 1.5°, 2°, etc).make_ts to generate a time series by taking the area
weighted average over each time step.To subset our data to a desired region and period of interest, we use
the subset_spacetime function, which has four arguments
data, years, bbox, and autosave.
Let’s subset the GPM-IMERGM data set over Central Europe (2,28,42,58)
for the 1981-2020 period, and inspect its content with
show_info:
gpm_subset <- subset_spacetime(gpm_global, years = c(2001, 2010), bbox = c(2,28,42,58))
show_info(gpm_subset)[1] "class      : RasterBrick "
[2] "dimensions : 64, 104, 6656, 120  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : 2, 28, 42, 58  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : memory"
[7] "names      : X2001.01.01,  X2001.02.01,  X2001.03.01,  X2001.04.01,  X2001.05.01,  X2001.06.01,  X2001.07.01,  X2001.08.01,  X2001.09.01,  X2001.10.01,  X2001.11.01,  X2001.12.01,  X2002.01.01,  X2002.02.01,  X2002.03.01, ... "
[8] "min values : 1.272205e+01, 4.698483e+00, 5.927317e+00, 2.240815e+00, 1.315575e+01, 1.301118e+00, 3.831070e+00, 4.547474e-13, 2.739577e+01, 1.662540e+00, 2.002276e+01, 1.084265e+00, 4.843051e+00, 3.975639e+00, 5.638179e+00, ... "
[9] "max values :     443.4645,     158.5196,     374.7221,     229.5028,     163.2903,     251.5495,     330.9900,     336.4113,     456.0420,     454.0903,     452.1386,     236.0807,     277.7888,     255.8143,     195.8183, ... "
[10] "time       : 2001-01-01, 2010-12-01 (min, max)"To further crop our data to a desired polygon other than a rectangle,
we use the crop_data function, which has three arguments
x, shp_path, autosave.
Let’s crop our GPM-IMERG subset to cover only Czechia with the
respective shape
file, and inspect its content with show_info:
[1] "class      : RasterBrick "
[2] "dimensions : 64, 104, 6656, 480  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : 2, 28, 42, 58  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : memory"
[7] "names      : X2001.01.01, X2001.02.01, X2001.03.01, X2001.04.01, X2001.05.01, X2001.06.01, X2001.07.01, X2001.08.01, X2001.09.01, X2001.10.01, X2001.11.01, X2001.12.01, X2002.01.01, X2002.02.01, X2002.03.01, ... "
[8] "min values :   43.226040,   30.070290,   65.995613,   46.767975,   44.382591,   52.406155,   83.416138,   51.177319,   88.692894,   14.673723,   49.876202,   55.442097,   21.179314,   57.682911,   33.612221, ... "
[9] "max values :    89.99401,    70.54952,   158.95328,   106.25800,    90.21087,   135.75002,   248.73044,   138.78595,   158.15816,    51.39417,   113.63100,   141.96646,    77.34425,   162.56750,   132.85863, ... "
[10] "time       : 2001-01-01, 2010-12-01 (min, max)"To make a time series out of our data, we use the
make_ts function, which has three arguments data,
name, and autosave.
Let’s generate the time series for our three different GPM-IMERGM data sets (Global, Central Europe, and Czechia), and inspect its first 12 rows:
          date     value           name            type
 1: 2000-06-01  93.60162 GPM IMERGM v06 Satellite-based
 2: 2000-07-01  96.01442 GPM IMERGM v06 Satellite-based
 3: 2000-08-01  94.16792 GPM IMERGM v06 Satellite-based
 4: 2000-09-01  90.38524 GPM IMERGM v06 Satellite-based
 5: 2000-10-01  93.90120 GPM IMERGM v06 Satellite-based
 6: 2000-11-01  93.55994 GPM IMERGM v06 Satellite-based
 7: 2000-12-01  96.68792 GPM IMERGM v06 Satellite-based
 8: 2001-01-01  94.71431 GPM IMERGM v06 Satellite-based
 9: 2001-02-01  85.94786 GPM IMERGM v06 Satellite-based
10: 2001-03-01  96.12793 GPM IMERGM v06 Satellite-based
11: 2001-04-01  96.99244 GPM IMERGM v06 Satellite-based
12: 2001-05-01 100.50446 GPM IMERGM v06 Satellite-based          date     value           name            type
 1: 2001-01-01  96.67884 GPM IMERGM v06 Satellite-based
 2: 2001-02-01  58.80170 GPM IMERGM v06 Satellite-based
 3: 2001-03-01  96.04202 GPM IMERGM v06 Satellite-based
 4: 2001-04-01  80.09136 GPM IMERGM v06 Satellite-based
 5: 2001-05-01  55.94958 GPM IMERGM v06 Satellite-based
 6: 2001-06-01  92.74124 GPM IMERGM v06 Satellite-based
 7: 2001-07-01  95.06115 GPM IMERGM v06 Satellite-based
 8: 2001-08-01  76.70639 GPM IMERGM v06 Satellite-based
 9: 2001-09-01 141.68700 GPM IMERGM v06 Satellite-based
10: 2001-10-01  62.51384 GPM IMERGM v06 Satellite-based
11: 2001-11-01  97.12927 GPM IMERGM v06 Satellite-based
12: 2001-12-01  71.00100 GPM IMERGM v06 Satellite-based          date     value           name            type
 1: 2001-01-01  59.36666 GPM IMERGM v06 Satellite-based
 2: 2001-02-01  50.59915 GPM IMERGM v06 Satellite-based
 3: 2001-03-01  96.69115 GPM IMERGM v06 Satellite-based
 4: 2001-04-01  73.23477 GPM IMERGM v06 Satellite-based
 5: 2001-05-01  64.74244 GPM IMERGM v06 Satellite-based
 6: 2001-06-01  86.48493 GPM IMERGM v06 Satellite-based
 7: 2001-07-01 127.52908 GPM IMERGM v06 Satellite-based
 8: 2001-08-01  94.31304 GPM IMERGM v06 Satellite-based
 9: 2001-09-01 119.28491 GPM IMERGM v06 Satellite-based
10: 2001-10-01  30.82040 GPM IMERGM v06 Satellite-based
11: 2001-11-01  72.33474 GPM IMERGM v06 Satellite-based
12: 2001-12-01  91.77480 GPM IMERGM v06 Satellite-basedEither after we have processed our data as required or right after downloaded, we have six different options to visualize our data:
plot_taylor to see a Taylor Diagram (requires a
referential data set).plot_map to see the Cartesian lon-lat map of the first
raster layer.plot_line to see the average time series.plot_heatmap to see a heatmap of all monthly
values.plot_box to see a seasonal boxplot.plot_density to see the empirical density of monthly
precipitation.plot_summary to see line, heatmap, box, and density
plot together in a single plot.Let’s plot our three different GPM-IMERGM data sets (Global, Central Europe, and Czechia)
To see a map of any data set raw or processed, we use
plot_map which takes only one layer of the RasterBrick as
input.
To draw a time series generated by make_ts, we use any
of the options below, which takes only a make_ts “.csv”
generated file.
Once we have generated our time series, we can start evaluating the data with:
pod to calculate the probability of detection.far to calculate the false alarm rate.csi to calculate the critical success index.nse to calculate the Nash–Sutcliffe efficiency.The above functions have three arguments x, ref,
and th (except for nse).
make_ts.NOTE: Not demonstrated in the current demo because such metrics are not intended for monthly data but rather higher temporal resolution data, e.g., daily or subdaily (coming soon).
More functions for data processing and analysis and expanding the database.