Binning functions#
SciKit-GStat implements a large amount of binning functions, which can be used to spatially aggregate the distance matrix into lag classes, or bins. There are a number of functions available, which usually accept more than one method identifier:
- skgstat.binning.even_width_lags(distances, n, maxlag)[source]#
Even lag edges
Calculate the lag edges for a given amount of bins using the same lag step width for all bins.
Changed in version 0.3.8: Function returns
None
as second value to indicate that The number of lag classes was not changed- Parameters:
distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.
n (integer) – Amount of lag classes to find
maxlag (integer, float) – Limit the last lag class to this separating distance.
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type:
- skgstat.binning.uniform_count_lags(distances, n, maxlag)[source]#
Uniform lag counts
Calculate the lag edges for a given amount of bins with the same amount of observations in each lag class. The lag step width will be variable.
Changed in version 0.3.8: Function returns
None
as second value to indicate that The number of lag classes was not changed- Parameters:
distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.
n (integer) – Amount of lag classes to find
maxlag (integer, float) – Limit the last lag class to this separating distance.
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type:
- skgstat.binning.auto_derived_lags(distances, method_name, maxlag)[source]#
Derive bins automatically .. versionadded:: 0.3.8
Uses
histogram_bin_edges <numpy.histogram_bin_edges>
to derive the lag classes automatically. Supports any method supported byhistogram_bin_edges <numpy.histogram_bin_edges>
. It is recommended to use'sturges'
,'doane'
or'fd'
.- Parameters:
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type:
See also
- skgstat.binning.kmeans(distances, n, maxlag, binning_random_state=42, **kwargs)[source]#
Added in version 0.3.9.
Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.
- Parameters:
distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.
n (integer) – Amount of lag classes to find
maxlag (integer, float) – Limit the last lag class to this separating distance.
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type:
See also
Note
The
KMeans
that is used under the hood is not a deterministic algorithm, as the starting cluster centroids are seeded randomly. This can yield slightly different results on reach run. Thus, for this application, the random_state on KMeans is fixed to a specific value. You can change the seed by passing another seed toVariogram
asbinning_random_state
.Changed in version 1.0.9: KMeans is now initialized as
KMeans(n_init=10)
as this default value will change in SciKit-Learn 1.4.
- skgstat.binning.ward(distances, n, maxlag, **kwargs)[source]#
Added in version 0.3.9.
Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.
The clustering is done by merging pairs of clusters that minimize the variance for the merged clusters, until
n
clusters are found.- Parameters:
distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.
n (integer) – Amount of lag classes to find
maxlag (integer, float) – Limit the last lag class to this separating distance.
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type:
See also
sklearn.clsuter.AgglomerativeClustering
- skgstat.binning.stable_entropy_lags(distances, n, maxlag, **kwargs)[source]#
Optimizes the lag class edges for
n
lag classes. The algorithm minimizes the difference between Shannon Entropy for each lag class. Consequently, the final lag classes should be of comparable uncertainty.- Parameters:
distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.
n (integer) – Amount of lag classes to find
maxlag (integer, float) – Limit the last lag class to this separating distance.
- Keyword Arguments:
- Returns:
bin_edges – The upper bin edges of the lag classes
- Return type: