s2spy.rgdr.rgdr

Response Guided Dimensionality Reduction.

Module Contents

Classes

RGDR

Response Guided Dimensionality Reduction.

Functions

spherical_area(→ float)

Approximate the area of a square grid cell on a spherical (!) earth.

cluster_area(→ float)

Determine the total area of a cluster.

remove_small_area_clusters(→ XrType)

Remove the clusters where the area is under the input threshold.

add_gridcell_area(data)

Add the area of each gridcell (latitude) in km2.

assert_clusters_present(→ None)

Assert that any (non-'0') clusters are present in the data.

_get_dbscan_clusters(→ numpy.ndarray)

Generate the DBSCAN cluster labels based on the correlation and p-value.

_find_clusters(→ xarray.DataArray)

Compute clusters and adds their labels to the precursor dataset.

masked_spherical_dbscan(→ xarray.DataArray)

Determine the clusters based on sklearn's DBSCAN implementation.

_pearsonr_nan(→ tuple[float, float])

NaN friendly implementation of scipy.stats.pearsonr.

correlation(→ tuple[xarray.DataArray, xarray.DataArray])

Calculate correlation maps.

partial_correlation(field, target, z)

Calculate partial correlation maps.

regression(field, target)

Regression analysis on entire maps.

stack_input_data(precursor, target, ...)

Stack input data.

Attributes

RADIUS_EARTH_KM

SURFACE_AREA_EARTH_KM2

XrType

s2spy.rgdr.rgdr.RADIUS_EARTH_KM = 6371[source]
s2spy.rgdr.rgdr.SURFACE_AREA_EARTH_KM2 = 510072000.0[source]
s2spy.rgdr.rgdr.XrType[source]
s2spy.rgdr.rgdr.spherical_area(latitude: float, dlat: float, dlon: float | None = None) float[source]

Approximate the area of a square grid cell on a spherical (!) earth.

Returns the area in square kilometers of earth surface.

Parameters:
  • latitude (float) – Latitude at the center of the grid cell (deg)

  • dlat (float) – Latitude grid resolution (deg)

  • dlon (float) – Longitude grid resolution (deg), optional in case of a square grid.

Returns:

Area of the grid cell (km^2)

Return type:

float

s2spy.rgdr.rgdr.cluster_area(ds: XrType, cluster_label: float) float[source]

Determine the total area of a cluster.

Requires the input dataset to have the variables area and cluster_labels.

Parameters:
  • ds (xr.Dataset or xr.DataArray) – Dataset/DataArray containing the variables area and cluster_labels.

  • cluster_label (float) – The label (as float) for which the area should be calculated.

Returns:

Area of the cluster cluster_label.

Return type:

float

s2spy.rgdr.rgdr.remove_small_area_clusters(ds: XrType, min_area_km2: float) XrType[source]

Remove the clusters where the area is under the input threshold.

Parameters:
  • ds (xr.DataArray, xr.Dataset) – Dataset containing cluster_labels and area.

  • min_area_km2 (float) – The minimum allowed area of each cluster

Returns:

The input dataset with the labels of the clusters set

to 0 when the area of the cluster is under the min_area_km2 threshold.

Return type:

xr.DataArray, xr.Dataset

s2spy.rgdr.rgdr.add_gridcell_area(data: xarray.DataArray)[source]

Add the area of each gridcell (latitude) in km2.

Note: Assumes an even grid (in degrees)

Parameters:

data – Data containing lat, lon coordinates in degrees.

Returns:

Input data with an added coordinate “area”.

s2spy.rgdr.rgdr.assert_clusters_present(data: xarray.DataArray) None[source]

Assert that any (non-‘0’) clusters are present in the data.

s2spy.rgdr.rgdr._get_dbscan_clusters(data: xarray.Dataset, coords: numpy.ndarray, dbscan_params: dict) numpy.ndarray[source]

Generate the DBSCAN cluster labels based on the correlation and p-value.

Parameters:
  • data – DataArray of the precursor field, of only a single i_interval. Requires the ‘latitude’ and ‘longitude’ dimensions to be stacked into a “coords” dimension.

  • coords – 2-D array containing the coordinates of each (lat, lon) grid point, in radians.

  • dbscan_params – Dictionary containing the elements ‘alpha’, ‘eps’, ‘min_area_km2’. See the documentation of RGDR for more information.

Returns:

1-D array of the same length as coords, containing cluster labels

for every coordinate.

Return type:

np.ndarray

s2spy.rgdr.rgdr._find_clusters(precursor: xarray.DataArray, corr: xarray.DataArray, p_val: xarray.DataArray, dbscan_params: dict) xarray.DataArray[source]

Compute clusters and adds their labels to the precursor dataset.

For clustering the DBSCAN algorithm is used, with a Haversine distance metric.

Parameters:
  • precursor (xr.DataArray) – DataArray of the precursor field, containing ‘latitude’ and ‘longitude’ dimensions in degrees.

  • corr (xr.DataArray) – DataArray with the correlation values, generated by correlation_map()

  • p_val (xr.DataArray) – DataArray with the p-values, generated by correlation_map()

  • dbscan_params (dict) – Dictionary containing the elements ‘alpha’, ‘eps’, ‘min_area_km2’. See the documentation of RGDR for more information.

Returns:

The input precursor data, with as extra coordinate labelled

clusters.

Return type:

xr.DataArray

s2spy.rgdr.rgdr.masked_spherical_dbscan(precursor: xarray.DataArray, corr: xarray.DataArray, p_val: xarray.DataArray, dbscan_params: dict) xarray.DataArray[source]

Determine the clusters based on sklearn’s DBSCAN implementation.

Alpha determines the mask based on the minimum p_value. Grouping can be adjusted using the eps_km parameter. Cluster labels are negative for areas with a negative correlation coefficient and positive for areas with a positive correlation coefficient. Areas without any significant correlation are put in the cluster labelled ‘0’.

Parameters:
  • precursor (xr.DataArray) – DataArray of the precursor field, containing ‘latitude’ and ‘longitude’ dimensions in degrees.

  • corr (xr.DataArray) – DataArray with the correlation values, generated by correlation_map()

  • p_val (xr.DataArray) – DataArray with the p-values, generated by correlation_map()

  • dbscan_params (dict) – Dictionary containing the elements ‘alpha’, ‘eps’, ‘min_area_km2’. See the documentation of RGDR for more information.

Returns:

Precursor data grouped by the DBSCAN clusters.

Return type:

xr.DataArray

s2spy.rgdr.rgdr._pearsonr_nan(x: numpy.ndarray, y: numpy.ndarray) tuple[float, float][source]

NaN friendly implementation of scipy.stats.pearsonr.

Calculates the correlation coefficient between two arrays, as well as the p-value of this correlation. However, instead of raising an error when encountering NaN values, this function will return both the correlation coefficient and the p-value as NaN.

Parameters:
  • x – 1-D array

  • y – 1-D array

Returns:

r_coefficient p_value

s2spy.rgdr.rgdr.correlation(field: xarray.DataArray, target: xarray.DataArray, corr_dim: str = 'time') tuple[xarray.DataArray, xarray.DataArray][source]

Calculate correlation maps.

Parameters:
  • field – Spatial data with a dimension named corr_dim, over which each location should have the Pearson correlation coefficient calculated with the target data.

  • target – Data which has to be correlated with the spatial data. Requires a dimension named corr_dim.

  • corr_dim – Dimension over which the correlation coefficient should be calculated.

Returns:

DataArray filled with the correlation coefficient for each

non-corr_dim coordinate.

p_value: DataArray filled with the two-tailed p-values for each computed

correlation coefficient.

Return type:

r_coefficient

s2spy.rgdr.rgdr.partial_correlation(field, target, z)[source]

Calculate partial correlation maps.

s2spy.rgdr.rgdr.regression(field, target)[source]

Regression analysis on entire maps.

Methods include Linear, Ridge, Lasso.

s2spy.rgdr.rgdr.stack_input_data(precursor, target, precursor_intervals, target_intervals)[source]

Stack input data.

class s2spy.rgdr.rgdr.RGDR(target_intervals: int | list[int], lag: int, eps_km: float, alpha: float, min_area_km2: float | None = None)[source]

Response Guided Dimensionality Reduction.

property target_intervals: list[int][source]

Return target intervals.

property precursor_intervals: list[int][source]

Return precursor intervals.

property cluster_map: xarray.DataArray[source]

Return cluster map.

property pval_map: xarray.DataArray[source]

Return p-value map.

property corr_map: xarray.DataArray[source]

Return correlation map.

get_correlation(precursor: xarray.DataArray, target: xarray.DataArray) tuple[xarray.DataArray, xarray.DataArray][source]

Calculate the correlation and p-value between input precursor and target.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, and ‘anchor_year’

  • target – Timeseries data with only the dimension ‘anchor_year’

Returns:

DataArrays containing the correlation and p-value.

Return type:

(correlation, p_value)

get_clusters(precursor: xarray.DataArray, target: xarray.DataArray) xarray.DataArray[source]

Generate clusters for the precursor data.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, ‘anchor_year’, and ‘i_interval’

  • target – Target timeseries data with only the dimensions ‘anchor_year’ and ‘i_interval’

Returns:

DataArray containing the clusters as masks.

preview_correlation(precursor: xarray.DataArray, target: xarray.DataArray, add_alpha_hatch: bool = True, ax1: matplotlib.pyplot.Axes | None = None, ax2: matplotlib.pyplot.Axes | None = None) list[matplotlib.collections.QuadMesh][source]

Preview correlation and p-value results with given inputs.

Generate a figure showing the correlation and p-value results with the initiated RGDR class and input precursor field.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, ‘anchor_year’, and ‘i_interval’

  • target – Target timeseries data with only the dimensions ‘anchor_year’ and ‘i_interval’

  • add_alpha_hatch – Adds a red hatching when the p-value is lower than the RGDR’s ‘alpha’ value.

  • ax1 – a matplotlib axis handle to plot the correlation values into. If None, an axis handle will be created instead.

  • ax2 – a matplotlib axis handle to plot the p-values into. If None, an axis handle will be created instead.

Returns:

List of matplotlib QuadMesh artists.

preview_clusters(precursor: xarray.DataArray, target: xarray.DataArray, ax: matplotlib.pyplot.Axes | None = None, **kwargs) matplotlib.collections.QuadMesh[source]

Preview clusters.

Generates a figure showing the clusters resulting from the initiated RGDR class and input precursor field.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, ‘anchor_year’, and ‘i_interval’

  • target – Target timeseries data with only the dimensions ‘anchor_year’ and ‘i_interval’

  • ax (plt.Axes, optional) – a matplotlib axis handle to plot the clusters into. If None, an axis handle will be created instead.

  • **kwargs – Keyword arguments that should be passed to QuadMesh.

Returns:

Matplotlib QuadMesh artist.

fit(precursor: xarray.DataArray, target: xarray.DataArray)[source]

Fit RGDR clusters to precursor data.

Performs DBSCAN clustering on a prepared DataArray, and then groups the data by their determined clusters, using an weighted mean. The weight is based on the area of each grid cell.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clusters gridcells together which are of the same sign and in proximity to each other using DBSCAN.

Clusters labelled with a positive value represent a positive correlation with the target timeseries, the clusters labelled with a negative value represent a negative correlation. All locations not in a cluster are grouped together under the label ‘0’.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, ‘anchor_year’, and ‘i_interval’

  • target – Target timeseries data with only the dimensions ‘anchor_year’ and ‘i_interval’, which will be correlated with the precursor field.

Returns:

The precursor data, with the latitute and longitude dimensions

reduced to clusters.

Return type:

xr.DataArray

transform(data: xarray.DataArray) xarray.DataArray[source]

Apply RGDR on the input data, based on the previous fit.

Transform will use the clusters previously generated when RGDR was fit, and use these clusters to reduce the latitude and longitude dimensions of the input data.

fit_transform(precursor: xarray.DataArray, timeseries: xarray.DataArray)[source]

Fit RGDR clusters to precursor data, and applies RGDR on the input data.

Parameters:
  • precursor – Precursor field data with the dimensions ‘latitude’, ‘longitude’, and ‘anchor_year’

  • timeseries – Timeseries data with only the dimension ‘anchor_year’, which will be correlated with the precursor field.

Returns:

The precursor data, with the latitute and longitude dimensions

reduced to clusters.

Return type:

xr.DataArray

__repr__() str[source]

Represent the RGDR transformer with strings.