Binning functions#

SciKit-GStat implements a large amount of binning functions, which can be used to spatially aggregate the distance matrix into lag classes, or bins. There are a number of functions available, which usually accept more than one method identifier:

skgstat.binning.even_width_lags(distances, n, maxlag)[source]#

Even lag edges

Calculate the lag edges for a given amount of bins using the same lag step width for all bins.

Changed in version 0.3.8: Function returns None as second value to indicate that The number of lag classes was not changed

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray

skgstat.binning.uniform_count_lags(distances, n, maxlag)[source]#

Uniform lag counts

Calculate the lag edges for a given amount of bins with the same amount of observations in each lag class. The lag step width will be variable.

Changed in version 0.3.8: Function returns None as second value to indicate that The number of lag classes was not changed

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray

skgstat.binning.auto_derived_lags(distances, method_name, maxlag)[source]#

Derive bins automatically .. versionadded:: 0.3.8

Uses histogram_bin_edges <numpy.histogram_bin_edges> to derive the lag classes automatically. Supports any method supported by histogram_bin_edges <numpy.histogram_bin_edges>. It is recommended to use 'sturges', 'doane' or 'fd'.

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

  • method_name (str) – Any method supported by histogram_bin_edges <numpy.histogram_bin_edges>

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray

skgstat.binning.kmeans(distances, n, maxlag, binning_random_state=42, **kwargs)[source]#

Added in version 0.3.9.

Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray

Note

The KMeans that is used under the hood is not a deterministic algorithm, as the starting cluster centroids are seeded randomly. This can yield slightly different results on reach run. Thus, for this application, the random_state on KMeans is fixed to a specific value. You can change the seed by passing another seed to Variogram as binning_random_state.

Changed in version 1.0.9: KMeans is now initialized as KMeans(n_init=10) as this default value will change in SciKit-Learn 1.4.

skgstat.binning.ward(distances, n, maxlag, **kwargs)[source]#

Added in version 0.3.9.

Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.

The clustering is done by merging pairs of clusters that minimize the variance for the merged clusters, until n clusters are found.

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray

See also

sklearn.clsuter.AgglomerativeClustering

skgstat.binning.stable_entropy_lags(distances, n, maxlag, **kwargs)[source]#

Optimizes the lag class edges for n lag classes. The algorithm minimizes the difference between Shannon Entropy for each lag class. Consequently, the final lag classes should be of comparable uncertainty.

Parameters:
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Keyword Arguments:
  • binning_maxiter (int) – Maximum iterations before the optimization is stopped, if the lag edges do not converge.

  • binning_entropy_bins (int, str) – Binning method for calculating the shannon entropy on each iteration.

Returns:

bin_edges – The upper bin edges of the lag classes

Return type:

numpy.ndarray