s2spy.rgdr.label_alignment ========================== .. py:module:: s2spy.rgdr.label_alignment .. autoapi-nested-parse:: Label alignment tools for RGDR clusters. Functions --------- .. autoapisummary:: s2spy.rgdr.label_alignment._get_split_cluster_dict s2spy.rgdr.label_alignment._flatten_cluster_dict s2spy.rgdr.label_alignment._init_overlap_df s2spy.rgdr.label_alignment._calculate_overlap s2spy.rgdr.label_alignment.calculate_overlap_table s2spy.rgdr.label_alignment.get_overlapping_clusters s2spy.rgdr.label_alignment.remove_subsets s2spy.rgdr.label_alignment.remove_overlapping_clusters s2spy.rgdr.label_alignment.name_clusters s2spy.rgdr.label_alignment.create_renaming_dict s2spy.rgdr.label_alignment.ensure_unique_names s2spy.rgdr.label_alignment._rename_datasets s2spy.rgdr.label_alignment.rename_labels Module Contents --------------- .. py:function:: _get_split_cluster_dict(cluster_labels: xarray.DataArray) -> dict Generate a dictionary of all cluster labels in each split. :param cluster_labels: DataArray containing all the cluster maps, with the dimension "split" for the different clusters over splits. :returns: [cluster_a, cluster_b], 1: [cluster_a], ...} :rtype: Dictionary in the form {0 .. py:function:: _flatten_cluster_dict(cluster_dict: dict) -> list[tuple[int, int]] Flattens a cluster dictionary to a list with (split, cluster) as values. For example, if the input is {0: [-1, -2, 1], 1: [-1, 1]}, this function will return the following list: [(0, -1), (0, -2), (0, 1), (1, -1), (1, 1)] :param cluster_dict: The cluster dictionary which should be flattened :returns: A list of the clusters and their splits .. py:function:: _init_overlap_df(cluster_labels: xarray.DataArray) Build an empty dataframe with multi-indexes for clusters and labels. The structure will be something like the following table: split | 0 1 label | -1 -1 ------------|------------------- split label | 0 -1 | NaN 0.583333 1 -1 | 0.333333 NaN The same multi-index is used for both rows and columns, such that the dataframe can be populated with the overlap between labels from different splits. :param cluster_labels: DataArray containing all the cluster maps, with the dimension "split" for the different clusters over splits. :returns: A pandas dataframe containing a table .. py:function:: _calculate_overlap(cluster_labels: xarray.DataArray, split_a: int, cluster_a: int, split_b: int, cluster_b: int) -> float Calculate the overlapping fraction between two clusters, over different splits. The overlap is defines as: overlap = n_overlapping_cells / total_cells_cluster_a :param cluster_labels: DataArray containing all the cluster maps, with the dimension "split" for the different clusters over splits. :param split_a: The index of the split of the first cluster :param cluster_a: The value of the first cluster in the clusters_da DataArray. :param split_b: The index of the split of the second cluster :param cluster_b: The value of the second cluster in the clusters_da DataArray. :returns: Overlap of the first cluster with the second cluster, as a fraction (0.0 - 1.0) .. py:function:: calculate_overlap_table(cluster_labels: xarray.DataArray) -> pandas.DataFrame Fill the overlap table with the overlap between clusters over different splits. :param cluster_labels: DataArray containing all the cluster maps, with the dimension "split" for the different clusters over splits. :returns: The overlap table with all valid combinations filled in. Non valid combinations of clusters (the cluster itself, or within the same split) will have NaN values. .. py:function:: get_overlapping_clusters(cluster_labels: xarray.DataArray, min_overlap: float = 0.1) -> set Create sets of overlapping clusters. Clusters will be considered to have sufficient overlap if they overlap at least by the minimum threshold. Note that this is a one way criterion. For example, if the overlap table is like the following: split | 0 1 label | -1 -1 ------------|------------- split label | 0 -1 | NaN 0.05 1 -1 | 0.20 NaN Then cluster (split: 0, label: -1) will overlap with cluster (1, -1) by 0.05. This is insufficient to be considered the same cluster. However, cluster (1, -1) does overlap by 0.20 with cluster (0, -1), so they *will* be considered the same cluster. This situation can arise when one cluster is much bigger than another one. In this example, the overlapping set will be {frozenset("0_-1", "1_-1")}. Note that if we would use a threshold of 0.05, the output would not change, as the two nexted sets {"0_-1", "1_-1"} and {"1_-1", "0_-1"} are the same. :param cluster_labels: DataArray containing all the cluster maps, with the dimension "split" for the different clusters over splits. :param min_overlap: Minimum overlap (0.0 - 1.0) when clusters are considered to be sufficiently overlapping to belong to the same signal. Defaults to 0.1. :returns: A set of (frozen) sets, each set corresponding to a possible combination of clusters that overlap. .. py:function:: remove_subsets(clusters: set) -> set Remove subsets from the clusters. For example: {{"A"}, {"A", "B"}} will become {{"A", "B"}}, as "A" is a subset of the bigger cluster. .. py:function:: remove_overlapping_clusters(clusters: set) -> set Remove clusters shared between two different groups of clusters. Largest cluster gets priority. For example: {{"A", "D"}, {"A", "B", "C"}} will become {{"D"}, {"A", "B", "C"}} .. py:function:: name_clusters(clusters: set) -> dict Give each cluster a unique name. Note: the first 26 names will be from A - Z. If more than 26 clusters are present, these will get names with two uppercase letters (AA - ZZ). :param clusters: A set of different clusters. Each element is a list of clusters and their splits. :returns: clusters0, cluster_name1: cluster1} :rtype: A dictionary in the form {clustername0 .. py:function:: create_renaming_dict(aligned_clusters: dict) -> dict[int, list[tuple[int, str]]] Create a dictionary that can be used to rename the clusters to the aligned names. :param aligned_clusters: A dictionary containing the different splits, and the mapping of RGDR clusters to new names. :returns: [(old_name0, new_name0), (old_name1, new_name1)]}. :rtype: A dictionary with the structure {split .. py:function:: ensure_unique_names(renaming_dict: dict[int, list[tuple[int, str]]]) -> dict Ensure that in every split, every cluster has a unique name. The function finds the non-unqiue names within each split, and will rename these by adding a number. For example, there are three clusters in the first split with the name "C". The new names will be "C", "C1" and "C2". If renaming_dict is the following: {0: [(-1, "A"), (1, "B")], 1: [(-1, "A"), (-2, "A")]} The renamed dictionary will be: {0: [(-1, "A1"), (1, "B")], 1: [(-1, "A1"), (-2, "A2")]} :param renaming_dict: Renaming dictionary with non unique names. :returns: Renaming dictionary with only unique names .. py:function:: _rename_datasets(rgdr_list: list[s2spy.rgdr.rgdr.RGDR], clustered_data: list[xarray.DataArray], renaming_dict: dict) -> list[xarray.DataArray] Apply the renaming dictionary to the labels of the clustered data. :param rgdr_list: List of RGDR objects that were used to fit and transform the data. :param clustered_data: List of the RGDR-transformed data. This can either be the training data or the test data. :param renaming_dict: Dictionary containing the mapping {old_label: new_label} :returns: A list of the input clustered data, with the labels renamed. .. py:function:: rename_labels(rgdr_list: list[s2spy.rgdr.rgdr.RGDR], clustered_data: list[xarray.DataArray]) -> list[xarray.DataArray] Return a new object with renamed cluster labels aligned over different splits. To aid in users comparing the clustering over different splits, this function tries to match the clusters over different splits, and give clusters that are in the same region the same name (e.g. "A"). The clusters themselves are not changed, only the labels renamed. :param rgdr_list: List of RGDR objects that were used to fit and transform the data. :param clustered_data: List of the RGDR-transformed datasets. This can either be the training data or the test data. :returns: A list of the input clustered data, with the labels renamed.