Getting started#
The xdatasets
library enables users to effortlessly access a vast collection of earth observation datasets that are compatible with xarray
formats.
The library adopts an opinionated approach to data querying and caters to the specific needs of certain user groups, such as hydrologists, climate scientists, and engineers. One of the functionalities of xdatasets
is the ability to extract data at a specific location or within a designated region, such as a watershed or municipality, while also enabling spatial and temporal operations.
To use xdatasets
, users must employ a query. For instance, a straightforward query to extract the variables t2m
(2m temperature) and tp
(Total precipitation) from the era5_reanalysis_single_levels
dataset at two geographical positions (Montreal and Toronto) could be as follows:
query = {
"datasets": {"era5_reanalysis_single_levels": {'variables': ["t2m", "tp"]}},
"space": {
"clip": "point", # bbox, point or polygon
"geometry": {'Montreal' : (45.508888, -73.561668),
'Toronto' : (43.651070, -79.347015)
}
}
}
An example of a more complex query would look like the one below.
Note Don’t worry! Below, you’ll find additional examples that will assist in understanding each parameter in the query, as well as the possible combinations.
This query calls the same variables as above. However, instead of specifying geographical positions, a GeoPandas.DataFrame is used to provide features (such as shapefiles or geojson) for extracting data within each of them. Each polygon is identified using the unique identifier Station
, and a spatial average is computed within each one (aggregation: True)
. The dataset, initially at an hourly time step, is converted into a daily time step while applying one or more temporal aggregations
for each variable as prescribed in the query. xdatasets
ultimately returns the dataset for the specified date range and time zone.
query = {
"datasets": {"era5_reanalysis_single_levels": {'variables': ["t2m", "tp"]}},
"space": {
"clip": "polygon", # bbox, point or polygon
"averaging": True, # spatial average of the variables within each polygon
"geometry": gdf,
"unique_id": "Station" # unique column name in geodataframe
},
"time": {
"timestep": "D",
"aggregation": {"tp": np.nansum,
"t2m": [np.nanmax, np.nanmin]},
"start": '2000-01-01',
"end": '2020-05-31',
"timezone": 'America/Montreal',
},
}
Query climate datasets#
In order to use xdatasets
, you must import at least xdatasets
, pandas
, geopandas
, and numpy
. Additionally, we import pathlib
to interact with files.
[1]:
import os
os.environ["USE_PYGEOS"] = "0"
from pathlib import Path
import geopandas as gpd
# Visualization
import hvplot.pandas # noqa
import hvplot.xarray # noqa-
import numpy as np
import pandas as pd
import panel as pn # noqa
import xdatasets as xd
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/clisops/core/regrid.py:42: UserWarning: xarray version >= 2023.03.0 is not supported for regridding operations with cf-time indexed arrays. Please use xarray version < 2023.03.0. For more information, see: https://github.com/pydata/xarray/issues/7794.
warnings.warn(
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Clip by points (sites)#
To begin with, we need to create a dictionary of sites and their corresponding geographical coordinates.
[2]:
sites = {
"Montreal": (45.508888, -73.561668),
"New York": (40.730610, -73.935242),
"Miami": (25.761681, -80.191788),
}
We will then extract the tp
(total precipitation) and t2m
(2m temperature) from the era5_reanalysis_single_levels
dataset for the designated sites. Afterward, we will convert the time step to daily and adjust the timezone to Eastern Time. Finally, we will limit the temporal interval.
Before proceeding with this first query, let’s quickly outline the role of each parameter:
datasets: A dictionary where datasets serve as keys and desired variables as values.
space: A dictionary that defines the necessary spatial operations to apply on user-supplied geographic features.
time: A dictionary that defines the necessary temporal operations to apply on the datasets
For more information on each parameter, consult the API documentation.
This is what the requested query looks like :
[3]:
query = {
"datasets": "era5_reanalysis_single_levels",
"space": {"clip": "point", "geometry": sites}, # bbox, point or polygon
"time": {
"timestep": "D", # http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
"aggregation": {"tp": np.nansum, "t2m": np.nanmean},
"start": "1959-01-01",
"timezone": "America/Montreal",
},
}
xds = xd.Query(**query)
Temporal operations: processing tp with era5_reanalysis_single_levels: 100%|██████████| 2/2 [00:05<00:00, 2.78s/it]
By accessing the data
attribute, you can view the data obtained from the query. It’s worth noting that the variable name tp
has been updated to tp_nansum
to reflect the reduction operation (np.nansum
) that was utilized to convert the time step from hourly to daily. Likewise, t2m
was updated to t2m_nanmean
.
[4]:
xds.data
[4]:
<xarray.Dataset> Dimensions: (spatial_agg: 1, timestep: 1, site: 3, time: 23621, source: 1) Coordinates: * spatial_agg (spatial_agg) object 'point' * timestep (timestep) object 'D' latitude (site) float64 45.5 40.75 25.75 longitude (site) float64 -73.5 -74.0 -80.25 * site (site) <U8 'Montreal' 'New York' 'Miami' * time (time) datetime64[ns] 1959-01-01 1959-01-02 ... 2023-09-02 * source (source) <U29 'era5_reanalysis_single_levels' Data variables: t2m_nanmean (spatial_agg, timestep, time, site) float32 260.7 ... 301.0 tp_nansum (spatial_agg, timestep, time, site) float32 0.0005895 ... 0.... Attributes: (12/30) GRIB_NV: 0 GRIB_Nx: 1440 GRIB_Ny: 721 GRIB_cfName: unknown GRIB_cfVarName: t2m GRIB_dataType: an ... ... GRIB_totalNumber: 0 GRIB_typeOfLevel: surface GRIB_units: K long_name: 2 metre temperature standard_name: unknown units: K
[5]:
title = f"Comparison of total precipitation across three cities in North America from \
{xds.data.time.dt.year.min().values} to {xds.data.time.dt.year.max().values}"
xds.data.sel(
timestep="D",
source="era5_reanalysis_single_levels",
).hvplot(
title=title,
x="time",
y="tp_nansum",
grid=True,
width=750,
height=450,
by="site",
legend="top",
widget_location="bottom",
)
[5]:
[6]:
title = f"Comparison of 2m temperature across three cities in North America from \
{xds.data.time.dt.year.min().values} to {xds.data.time.dt.year.max().values}"
xds.data.sel(
timestep="D",
source="era5_reanalysis_single_levels",
).hvplot(
title=title,
x="time",
y="t2m_nanmean",
grid=True,
width=750,
height=450,
by="site",
legend="top",
widget_location="bottom",
)
[6]:
Clip on polygons with no averaging in space#
First, let’s explore specific polygon features. With xdatasets
, you can access geographical datasets, such as watershed boundaries linked to streamflow stations. These datasets follow a nomenclature where they are named after the hydrological dataset, with "_polygons"
appended. For example, if the hydrological dataset is named deh
, its corresponding watershed boundaries dataset will be labeled deh_polygons
. The query below retrieves all polygons for the deh_polygons
dataset.
gdf = xd.Query(
**{
"datasets": "deh_polygons"
}).data
gdf
As the data is loaded into memory, the process of loading all polygons may take some time. To expedite this, we recommend employing filters, as illustrated below. It’s important to note that the filters are consistent for both hydrological and corresponding geographical datasets. Consequently, only watershed boundaries associated with existing hydrological data will be returned.
[7]:
[7]:
Station | Superficie | geometry | |
---|---|---|---|
0 | 042102 | 623.479187 | POLYGON ((-78.57120 46.70742, -78.57112 46.707... |
1 | 042103 | 579.479614 | POLYGON ((-78.49014 46.64514, -78.49010 46.645... |
Let’s examine the geographic locations of the polygon features.
[8]:
gdf.hvplot(
geo=True,
tiles="ESRI",
color="Station",
alpha=0.8,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
[8]:
The following query seeks the variables t2m
and tp
from the era5_reanalysis_single_levels
dataset, covering the period between January 1, 1959, and September 30, 1961, for the three polygons mentioned earlier. It is important to note that as aggregation
is set to False
, no spatial averaging will be conducted, and a mask (raster) will be returned for each polygon.
[9]:
query = {
"datasets": {"era5_reanalysis_single_levels": {"variables": ["t2m", "tp"]}},
"space": {
"clip": "polygon", # bbox, point or polygon
"averaging": False, # spatial average of the variables within each polygon
"geometry": gdf,
"unique_id": "Station", # unique column name in geodataframe
},
"time": {
"start": "1959-01-01",
"end": "1963-08-31",
},
}
xds = xd.Query(**query)
Spatial operations: processing polygon 042102 with era5_reanalysis_single_levels: : 0it [00:00, ?it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_reanalysis_single_levels: : 1it [00:00, 5.12it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_reanalysis_single_levels: : 2it [00:00, 5.18it/s]
By accessing the data
attribute, you can view the data obtained from the query. For each variable, the dimensions of time
, latitude
, longitude
, and Station
(the unique ID) are included. In addition, there is another variable called weights
that is returned. This variable specifies the weight that should be assigned to each pixel if spatial averaging is conducted over a mask (polygon).
[10]:
xds.data
[10]:
<xarray.Dataset> Dimensions: (Station: 2, time: 40896, latitude: 3, longitude: 2, source: 1) Coordinates: * latitude (latitude) float64 46.25 46.5 46.75 * longitude (longitude) float64 -78.5 -78.25 * time (time) datetime64[ns] 1959-01-01 ... 1963-08-31T23:00:00 * Station (Station) object '042102' '042103' * source (source) <U29 'era5_reanalysis_single_levels' Data variables: t2m (Station, time, latitude, longitude) float32 260.3 259.9 ... nan tp (Station, time, latitude, longitude) float32 nan nan ... 0.0 nan weights (Station, latitude, longitude) float64 0.01953 0.1244 ... nan Attributes: Conventions: CF-1.6 history: 2022-11-10 02:03:41 GMT by grib_to_netcdf-2.25... pangeo-forge:inputs_hash: c4e1de94d7bedf0a63629db8fa0633c03b7e266149e97f... pangeo-forge:recipe_hash: 66be9c20b44c1a0fca83c2dd2a6f147aecc5be14590f1f... pangeo-forge:version: 0.9.4
Weights are much easier to comprehend visually, so let’s examine the weights returned for the station 042102. Notice that when selecting a single feature (Station 042102 in this case), the shape of our spatial dimensions is reduced to a 3x2 pixel area (longitude x latitude) that encompasses the entire feature.
[11]:
station = "042102"
ds_station = xds.data.sel(Station=station)
ds_clipped = xds.bbox_clip(ds_station).squeeze()
ds_clipped
[11]:
<xarray.Dataset> Dimensions: (time: 40896, latitude: 3, longitude: 2) Coordinates: * latitude (latitude) float64 46.25 46.5 46.75 * longitude (longitude) float64 -78.5 -78.25 * time (time) datetime64[ns] 1959-01-01 ... 1963-08-31T23:00:00 Station <U6 '042102' source <U29 'era5_reanalysis_single_levels' Data variables: t2m (time, latitude, longitude) float32 260.3 259.9 ... 288.2 nan tp (time, latitude, longitude) float32 nan nan nan ... 0.0 0.0 nan weights (latitude, longitude) float64 0.01953 0.1244 ... 0.0481 nan Attributes: Conventions: CF-1.6 history: 2022-11-10 02:03:41 GMT by grib_to_netcdf-2.25... pangeo-forge:inputs_hash: c4e1de94d7bedf0a63629db8fa0633c03b7e266149e97f... pangeo-forge:recipe_hash: 66be9c20b44c1a0fca83c2dd2a6f147aecc5be14590f1f... pangeo-forge:version: 0.9.4
[12]:
(
(
ds_clipped.t2m.isel(time=0).hvplot(
title="The 2m temperature for pixels that intersect with the polygon on January 1, 1959",
tiles="ESRI",
geo=True,
alpha=0.6,
colormap="isolum",
width=750,
height=450,
)
* gdf[gdf.Station == station].hvplot(
geo=True,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
)
+ ds_clipped.weights.hvplot(
title="The weights that should be assigned to each pixel when performing spatial averaging",
tiles="ESRI",
alpha=0.6,
colormap="isolum",
geo=True,
width=750,
height=450,
)
* gdf[gdf.Station == station].hvplot(
geo=True,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
).cols(1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/IPython/core/formatters.py:974, in MimeBundleFormatter.__call__(self, obj, include, exclude)
971 method = get_real_method(obj, self.print_method)
973 if method is not None:
--> 974 return method(include=include, exclude=exclude)
975 return None
976 else:
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/core/dimension.py:1286, in Dimensioned._repr_mimebundle_(self, include, exclude)
1279 def _repr_mimebundle_(self, include=None, exclude=None):
1280 """
1281 Resolves the class hierarchy for the class rendering the
1282 object using any display hooks registered on Store.display
1283 hooks. The output of all registered display_hooks is then
1284 combined and returned.
1285 """
-> 1286 return Store.render(self)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/core/options.py:1428, in Store.render(cls, obj)
1426 data, metadata = {}, {}
1427 for hook in hooks:
-> 1428 ret = hook(obj)
1429 if ret is None:
1430 continue
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/ipython/display_hooks.py:287, in pprint_display(obj)
285 if not ip.display_formatter.formatters['text/plain'].pprint:
286 return None
--> 287 return display(obj, raw_output=True)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/ipython/display_hooks.py:258, in display(obj, raw_output, **kwargs)
256 elif isinstance(obj, (Layout, NdLayout, AdjointLayout)):
257 with option_state(obj):
--> 258 output = layout_display(obj)
259 elif isinstance(obj, (HoloMap, DynamicMap)):
260 with option_state(obj):
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/ipython/display_hooks.py:149, in display_hook.<locals>.wrapped(element)
147 try:
148 max_frames = OutputSettings.options['max_frames']
--> 149 mimebundle = fn(element, max_frames=max_frames)
150 if mimebundle is None:
151 return {}, {}
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/ipython/display_hooks.py:223, in layout_display(layout, max_frames)
220 max_frame_warning(max_frames)
221 return None
--> 223 return render(layout)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/ipython/display_hooks.py:76, in render(obj, **kwargs)
73 if renderer.fig == 'pdf':
74 renderer = renderer.instance(fig='png')
---> 76 return renderer.components(obj, **kwargs)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/renderer.py:396, in Renderer.components(self, obj, fmt, comm, **kwargs)
393 embed = (not (dynamic or streams or self.widget_mode == 'live') or config.embed)
395 if embed or config.comms == 'default':
--> 396 return self._render_panel(plot, embed, comm)
397 return self._render_ipywidget(plot)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/renderer.py:403, in Renderer._render_panel(self, plot, embed, comm)
401 doc = Document()
402 with config.set(embed=embed):
--> 403 model = plot.layout._render_model(doc, comm)
404 if embed:
405 return render_model(model, comm)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/viewable.py:748, in Viewable._render_model(self, doc, comm)
746 if comm is None:
747 comm = state._comm_manager.get_server_comm()
--> 748 model = self.get_root(doc, comm)
750 if self._design and self._design.theme.bokeh_theme:
751 doc.theme = self._design.theme.bokeh_theme
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/layout/base.py:311, in Panel.get_root(self, doc, comm, preprocess)
307 def get_root(
308 self, doc: Optional[Document] = None, comm: Optional[Comm] = None,
309 preprocess: bool = True
310 ) -> Model:
--> 311 root = super().get_root(doc, comm, preprocess)
312 # ALERT: Find a better way to handle this
313 if hasattr(root, 'styles') and 'overflow-x' in root.styles:
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/viewable.py:666, in Renderable.get_root(self, doc, comm, preprocess)
664 wrapper = self._design._wrapper(self)
665 if wrapper is self:
--> 666 root = self._get_model(doc, comm=comm)
667 if preprocess:
668 self._preprocess(root)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/layout/base.py:177, in Panel._get_model(self, doc, root, parent, comm)
175 root = root or model
176 self._models[root.ref['id']] = (model, parent)
--> 177 objects, _ = self._get_objects(model, [], doc, root, comm)
178 props = self._get_properties(doc)
179 props[self._property_mapping['objects']] = objects
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/layout/base.py:159, in Panel._get_objects(self, model, old_objects, doc, root, comm)
157 else:
158 try:
--> 159 child = pane._get_model(doc, root, model, comm)
160 except RerenderError as e:
161 if e.layout is not None and e.layout is not self:
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/pane/holoviews.py:423, in HoloViews._get_model(self, doc, root, parent, comm)
421 plot = self.object
422 else:
--> 423 plot = self._render(doc, comm, root)
425 plot.pane = self
426 backend = plot.renderer.backend
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/panel/pane/holoviews.py:518, in HoloViews._render(self, doc, comm, root)
515 if comm:
516 kwargs['comm'] = comm
--> 518 return renderer.get_plot(self.object, **kwargs)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/renderer.py:68, in BokehRenderer.get_plot(self_or_cls, obj, doc, renderer, **kwargs)
61 @bothmethod
62 def get_plot(self_or_cls, obj, doc=None, renderer=None, **kwargs):
63 """
64 Given a HoloViews Viewable return a corresponding plot instance.
65 Allows supplying a document attach the plot to, useful when
66 combining the bokeh model with another plot.
67 """
---> 68 plot = super().get_plot(obj, doc, renderer, **kwargs)
69 if plot.document is None:
70 plot.document = Document() if self_or_cls.notebook_context else curdoc()
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/renderer.py:240, in Renderer.get_plot(self_or_cls, obj, doc, renderer, comm, **kwargs)
237 defaults = [kd.default for kd in plot.dimensions]
238 init_key = tuple(v if d is None else d for v, d in
239 zip(plot.keys[0], defaults))
--> 240 plot.update(init_key)
241 else:
242 plot = obj
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/plot.py:955, in DimensionedPlot.update(self, key)
953 def update(self, key):
954 if len(self) == 1 and key in (0, self.keys[0]) and not self.drawn:
--> 955 return self.initialize_plot()
956 item = self.__getitem__(key)
957 self.traverse(lambda x: setattr(x, '_updated', True))
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/plot.py:945, in LayoutPlot.initialize_plot(self, plots, ranges)
942 continue
944 shared_plots = list(passed_plots) if self.shared_axes else None
--> 945 subplots = subplot.initialize_plot(ranges=ranges, plots=shared_plots)
946 nsubplots = len(subplots)
948 modes = {sp.sizing_mode for sp in subplots
949 if sp.sizing_mode not in (None, 'auto', 'fixed')}
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/plot.py:1094, in AdjointLayoutPlot.initialize_plot(self, ranges, plots)
1092 else:
1093 passed_plots = plots + adjoined_plots
-> 1094 adjoined_plots.append(subplot.initialize_plot(ranges=ranges, plots=passed_plots))
1095 self.drawn = True
1096 if not adjoined_plots: adjoined_plots = [None]
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/geoviews/plotting/bokeh/plot.py:106, in GeoPlot.initialize_plot(self, ranges, plot, plots, source)
104 def initialize_plot(self, ranges=None, plot=None, plots=None, source=None):
105 opts = {} if isinstance(self, HvOverlayPlot) else {'source': source}
--> 106 fig = super().initialize_plot(ranges, plot, plots, **opts)
107 if self.geographic and self.show_bounds and not self.overlaid:
108 from . import GeoShapePlot
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/element.py:2899, in OverlayPlot.initialize_plot(self, ranges, plot, plots)
2897 if self.tabs:
2898 subplot.overlaid = False
-> 2899 child = subplot.initialize_plot(ranges, plot, plots)
2900 if isinstance(element, CompositeOverlay):
2901 # Ensure that all subplots are in the same state
2902 frame = element.get(key, None)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/geoviews/plotting/bokeh/plot.py:106, in GeoPlot.initialize_plot(self, ranges, plot, plots, source)
104 def initialize_plot(self, ranges=None, plot=None, plots=None, source=None):
105 opts = {} if isinstance(self, HvOverlayPlot) else {'source': source}
--> 106 fig = super().initialize_plot(ranges, plot, plots, **opts)
107 if self.geographic and self.show_bounds and not self.overlaid:
108 from . import GeoShapePlot
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/element.py:1888, in ElementPlot.initialize_plot(self, ranges, plot, plots, source)
1886 if self.autorange:
1887 self._setup_autorange()
-> 1888 self._init_glyphs(plot, element, ranges, source)
1889 if not self.overlaid:
1890 self._update_plot(key, plot, style_element)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/element.py:1829, in ElementPlot._init_glyphs(self, plot, element, ranges, source)
1826 if isinstance(renderer, Renderer):
1827 self.handles['glyph_renderer'] = renderer
-> 1829 self._postprocess_hover(renderer, source)
1831 # Update plot, source and glyph
1832 with abbreviated_exception():
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/geoviews/plotting/bokeh/plot.py:117, in GeoPlot._postprocess_hover(self, renderer, source)
116 def _postprocess_hover(self, renderer, source):
--> 117 super()._postprocess_hover(renderer, source)
118 hover = getattr(self.handles["plot"], "hover")
119 hover = hover[0] if hover else None
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/holoviews/plotting/bokeh/raster.py:85, in RasterPlot._postprocess_hover(self, renderer, source)
76 datetime_code = """
77 if (value === -9223372036854776) {
78 return "NaN"
(...)
82 }
83 """
84 for key in formatters:
---> 85 if formatters[key].lower() == "datetime":
86 formatters[key] = CustomJSHover(code=datetime_code)
88 hover.tooltips = tooltips
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/bokeh/core/has_props.py:367, in HasProps.__getattr__(self, name)
364 if isinstance(descriptor, property): # Python property
365 return super().__getattribute__(name)
--> 367 self._raise_attribute_error_with_matches(name, properties)
File ~/micromamba/envs/xdatasets/lib/python3.9/site-packages/bokeh/core/has_props.py:375, in HasProps._raise_attribute_error_with_matches(self, name, properties)
372 if not matches:
373 matches, text = sorted(properties), "possible"
--> 375 raise AttributeError(f"unexpected attribute {name!r} to {self.__class__.__name__}, {text} attributes are {nice_join(matches)}")
AttributeError: unexpected attribute 'lower' to CustomJSHover, possible attributes are args, code, js_event_callbacks, js_property_callbacks, name, subscribed_events, syncable or tags
[12]:
:Layout
.Overlay.I :Overlay
.WMTS.I :WMTS [Longitude,Latitude]
.Image.I :Image [longitude,latitude] (t2m)
.Polygons.I :Polygons [Longitude,Latitude] (Station,Superficie)
.Overlay.II :Overlay
.WMTS.I :WMTS [Longitude,Latitude]
.Image.I :Image [longitude,latitude] (weights)
.Polygons.I :Polygons [Longitude,Latitude] (Station,Superficie)
The two plots depicted above show the 2m temperature for each pixel that intersects with the polygon from Station 042102
and the corresponding weights to be applied to each pixel. In the lower plot, it is apparent that the majority of the polygon is situated in the central pixels, which results in those pixels having a weight of approximately 80%. It is evident that the two lower and the upper pixels have much less intersection with the polygon, which results in their respective weights
being smaller (hover on the plot to verify the weights).
In various libraries, either all pixels that intersect with the geometries are kept, or only pixels with centers within the polygon are retained. However, as shown in the previous example, utilizing such methods can introduce significant biases in the final calculations.
Clip on polygons with averaging in space#
The following query seeks the variables t2m
and tp
from the era5_reanalysis_single_levels
and era5_land_reanalysis
datasets, covering the period between January 1, 1950, to present, for the three polygons mentioned earlier. Note that when the aggregation
parameter is set to True
, spatial averaging takes place. In addition, the weighted mask (raster) described earlier will be applied to generate a time series for each polygon.
Additional steps are carried out in the process, including converting the original hourly time step to a daily time step. During this conversion, various temporal aggregations will be applied to each variable and a conversion to the local time zone will take place.
Note If users prefer to pass multiple dictionaries instead of a single large one, the following format is also considered acceptable.
[13]:
datasets = {
"era5_reanalysis_single_levels": {"variables": ["t2m", "tp"]},
"era5_land_reanalysis": {"variables": ["t2m", "tp"]},
}
space = {
"clip": "polygon", # bbox, point or polygon
"averaging": True,
"geometry": gdf, # 3 polygons
"unique_id": "Station",
}
time = {
"timestep": "D",
"aggregation": {"tp": [np.nansum], "t2m": [np.nanmax, np.nanmin]},
"start": "1950-01-01",
"timezone": "America/Montreal",
}
xds = xd.Query(datasets=datasets, space=space, time=time)
/home/runner/work/xdatasets/xdatasets/xdatasets/workflows.py:51: UserWarning: "start_date" not found within input date time range. Defaulting to minimum time step in xarray object.
ds = subset_time(ds, start_date=start_time, end_date=end_time)
Spatial operations: processing polygon 042102 with era5_reanalysis_single_levels: : 0it [00:00, ?it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_reanalysis_single_levels: : 1it [00:00, 4.61it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_reanalysis_single_levels: : 2it [00:00, 4.60it/s]
Temporal operations: processing tp with era5_reanalysis_single_levels: 100%|██████████| 2/2 [00:08<00:00, 4.03s/it]
Spatial operations: processing polygon 042102 with era5_land_reanalysis: : 0it [00:00, ?it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_land_reanalysis: : 1it [00:00, 3.63it/s]/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/aux.py:230: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
bnds_tmp = xr.DataArray(data=np.zeros((ds.dims[var],2))*np.nan,
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:253: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
(ds_bnds,) = (xr.broadcast(ds.isel({d:0 for d in [k for k in ds.dims.keys() if k not in ['lat','lon','bnds']]}).
/home/runner/micromamba/envs/xdatasets/lib/python3.9/site-packages/xagg/core.py:272: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gdf_pixels = gpd.GeoDataFrame(pd.DataFrame({v:[None]*ds_bnds.dims['loc']
Spatial operations: processing polygon 042103 with era5_land_reanalysis: : 2it [00:00, 3.86it/s]
Temporal operations: processing tp with era5_land_reanalysis: 100%|██████████| 2/2 [00:08<00:00, 4.25s/it]
[14]:
xds.data
[14]:
<xarray.Dataset> Dimensions: (spatial_agg: 1, timestep: 1, Station: 2, time: 27060, source: 2) Coordinates: * spatial_agg (spatial_agg) object 'polygon' * timestep (timestep) object 'D' * Station (Station) object '042102' '042103' * time (time) datetime64[ns] 1950-01-01 1950-01-02 ... 2024-02-01 * source (source) <U29 'era5_land_reanalysis' 'era5_reanalysis_single... Data variables: t2m_nanmax (spatial_agg, timestep, Station, time, source) float64 272.8... t2m_nanmin (spatial_agg, timestep, Station, time, source) float64 268.7... tp_nansum (spatial_agg, timestep, Station, time, source) float64 0.000... Attributes: (12/30) GRIB_NV: 0 GRIB_Nx: 1440 GRIB_Ny: 721 GRIB_cfName: unknown GRIB_cfVarName: t2m GRIB_dataType: an ... ... GRIB_totalNumber: 0 GRIB_typeOfLevel: surface GRIB_units: K long_name: 2 metre temperature standard_name: unknown units: K
[15]:
(
xds.data[["t2m_nanmax", "t2m_nanmin"]]
.squeeze()
.hvplot(
x="time",
groupby=["Station", "source"],
width=750,
height=400,
grid=True,
widget_location="bottom",
)
)
[15]:
The resulting dataset can be explored for the total_precipitation
(tp) data attribute :
[16]:
(
xds.data[["tp_nansum"]]
.squeeze()
.hvplot(
x="time",
groupby=["Station", "source"],
width=750,
height=400,
grid=True,
widget_location="bottom",
color="blue",
)
)
[16]:
Bounding box (bbox) around polygons#
The following query seeks the variable tp
from the era5_land_reanalysis_dev
dataset, covering the period between January 1, 1959, and December 31, 1970, for the bounding box that delimits the three polygons mentioned earlier.
Additional steps are carried out in the process, including converting to the local time zone.
[17]:
[18]:
xds.data
[18]:
<xarray.Dataset> Dimensions: (time: 105192, latitude: 9, longitude: 11, source: 1) Coordinates: * latitude (latitude) float64 46.9 46.8 46.7 46.6 46.5 46.4 46.3 46.2 46.1 * longitude (longitude) float64 -78.9 -78.8 -78.7 -78.6 ... -78.1 -78.0 -77.9 * time (time) datetime64[ns] 1959-01-01 ... 1970-12-31T23:00:00 * source (source) <U20 'era5_land_reanalysis' Data variables: tp (time, latitude, longitude) float32 1.08e-07 1.08e-07 ... 0.0 0.0 Attributes: pangeo-forge:inputs_hash: 4c04980cf1e6af8df1d96abaacccdcdb7d2fcd2c8605dc... pangeo-forge:recipe_hash: 5fa22023abf774ec274a71534d060ea1643071fbcf580f... pangeo-forge:version: 0.9.4 timezone: America/Montreal
Let’s find out which day (24-hour period) was the rainiest in the entire region for the data retrieved in previous cell.
[19]:
indexer = (
xds.data.sel(source="era5_land_reanalysis")
.tp.sum(["latitude", "longitude"])
.rolling(time=24)
.sum()
.argmax("time")
.values
)
xds.data.isel(time=indexer).time.dt.date.values.tolist()
[19]:
datetime.date(1961, 9, 26)
Let’s visualise the evolution of the hourly precipitation during that day. Note that each image (raster) delimits exactly the bounding box required to cover all polygons in the query. Please note that for full interactivity, running the code in a Jupyter Notebook is necessary.
[20]:
da = xds.data.tp.isel(time=slice(indexer - 24, indexer))
da = da.where(da > 0.0001, drop=True)
(da * 1000).squeeze().hvplot.quadmesh(
width=750,
height=450,
geo=True,
tiles="ESRI",
groupby=["time"],
legend="top",
cmap="gist_ncar",
widget_location="bottom",
widget_type="scrubber",
dynamic=False,
clim=(0.01, 10),
)
[20]:
Query hydrological datasets#
Hydrological queries are still being tested and output format is likely to change. Stay tuned!
[21]:
<xarray.Dataset> Dimensions: (id: 744, variable: 2, spatial_agg: 2, timestep: 1, time_agg: 1, source: 1, time: 60631) Coordinates: (12/15) drainage_area (id) float32 dask.array<chunksize=(744,), meta=np.ndarray> end_date (variable, id, spatial_agg, timestep, time_agg, source) datetime64[ns] dask.array<chunksize=(2, 744, 2, 1, 1, 1), meta=np.ndarray> * id (id) object '010101' '010801' '010802' ... '104804' '120201' latitude (id) float32 dask.array<chunksize=(744,), meta=np.ndarray> longitude (id) float32 dask.array<chunksize=(744,), meta=np.ndarray> name (id) object dask.array<chunksize=(744,), meta=np.ndarray> ... ... * spatial_agg (spatial_agg) object 'point' 'watershed' start_date (variable, id, spatial_agg, timestep, time_agg, source) datetime64[ns] dask.array<chunksize=(2, 744, 2, 1, 1, 1), meta=np.ndarray> * time (time) datetime64[ns] 1860-01-01 1860-01-02 ... 2025-12-31 * time_agg (time_agg) object 'mean' * timestep (timestep) object 'D' * variable (variable) object 'level' 'streamflow' Data variables: level (id, time, variable, spatial_agg, timestep, time_agg, source) float32 dask.array<chunksize=(1, 60631, 1, 1, 1, 1, 1), meta=np.ndarray> streamflow (id, time, variable, spatial_agg, timestep, time_agg, source) float32 dask.array<chunksize=(1, 60631, 1, 1, 1, 1, 1), meta=np.ndarray>
[22]:
ds = (
xd.Query(
**{
"datasets": {
"deh": {
"id": ["020*"],
"regulated": ["Natural"],
"variables": ["streamflow"],
}
},
"time": {"start": "1970-01-01", "minimum_duration": (15 * 365, "d")},
}
)
.data.squeeze()
.load()
)
ds
[22]:
<xarray.Dataset> Dimensions: (id: 5, time: 20454) Coordinates: (12/15) drainage_area (id) float32 1.09e+03 647.0 59.8 626.0 1.2e+03 end_date (id) datetime64[ns] 2006-10-13 2024-02-03 ... 1996-08-13 * id (id) object '020302' '020404' '020502' '020602' '020802' latitude (id) float32 48.77 48.81 48.98 48.98 49.2 longitude (id) float32 -64.52 -64.92 -64.43 -64.7 -65.29 name (id) object 'Saint' 'York' ... 'Dartmouth' 'Madeleine' ... ... spatial_agg <U9 'watershed' start_date (id) datetime64[ns] 1989-08-12 1980-10-01 ... 1970-01-01 * time (time) datetime64[ns] 1970-01-01 1970-01-02 ... 2025-12-31 time_agg <U4 'mean' timestep <U1 'D' variable <U10 'streamflow' Data variables: streamflow (id, time) float32 nan nan nan nan nan ... nan nan nan nan
[23]:
<xarray.Dataset> Dimensions: (data_type: 2, id: 7881, spatial_agg: 2, timestep: 1, time_agg: 1, latitude: 2800, longitude: 4680, time: 59413) Coordinates: (12/15) * data_type (data_type) <U5 'flow' 'level' drainage_area (id) float64 dask.array<chunksize=(10,), meta=np.ndarray> end_date (id, data_type, spatial_agg, timestep, time_agg) object dask.array<chunksize=(7881, 2, 2, 1, 1), meta=np.ndarray> * id (id) <U7 '01AA002' '01AD001' ... '11AF004' '11AF005' * latitude (latitude) float64 85.0 84.97 84.95 ... 15.07 15.05 15.02 * longitude (longitude) float64 -167.0 -167.0 -166.9 ... -50.05 -50.02 ... ... source (id) object dask.array<chunksize=(7881,), meta=np.ndarray> * spatial_agg (spatial_agg) object 'point' 'watershed' start_date (id, data_type, spatial_agg, timestep, time_agg) object dask.array<chunksize=(7881, 2, 2, 1, 1), meta=np.ndarray> * time (time) datetime64[ns] 1860-01-01 1860-01-02 ... 2022-08-31 * time_agg (time_agg) <U4 'mean' * timestep (timestep) <U3 'day' Data variables: mask (id, latitude, longitude) float64 dask.array<chunksize=(1, 500, 500), meta=np.ndarray> value (id, time, data_type, spatial_agg, timestep, time_agg) float64 dask.array<chunksize=(10, 59413, 1, 1, 1, 1), meta=np.ndarray>