Scikit-GStat implements various semi-variance estimators. These functions can
be found in the skgstat.estimators submodule. Each of these functions can be
used independently from Variogram class. In this case the estimator is
expecting an array of pairwise differences to calculate the semi-variance.
Not the values themselves.
Calculates the Matheron Semi-Variance from an array of pairwise differences.
Returns the semi-variance for the whole array. In case a semi-variance is
needed for multiple groups, this function has to be mapped on each group.
That is the typical use case in geostatistics.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
Return type:
numpy.float64
Notes
This implementation follows the original publication [1] and the
notes on their application [2]. Following the 1962 publication [1],
the semi-variance is calculated as:
Calculates the Cressie-Hawkins Semi-Variance from an array of pairwise
differences. Returns the semi-variance for the whole array. In case a
semi-variance is needed for multiple groups, this function has to be
mapped on each group. That is the typical use case in geostatistics.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
Return type:
numpy.float64
Notes
This implementation is done after the publication by Cressie and Hawkins
from 1980 [3]:
Calculates the Dowd semi-variance from an array of pairwise
differences. Returns the semi-variance for the whole array. In case a
semi-variance is needed for multiple groups, this function has to be
mapped on each group. That is the typical use case in geostatistics.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
Return type:
numpy.float64
Notes
The Dowd estimator is based on the median of all pairwise differences in
each lag class and is therefore robust to exteme values at the cost of
variability.
This implementation follows Dowd’s publication [4]:
Return the Genton semi-variance of the given sample x. Genton is a highly
robust varigram estimator, that is designed to be location free and
robust on extreme values in x.
Genton is based on calculating kth order statistics and will for large
data sets be close or equal to the 25% quartile of all ordered point pairs
in X.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
Return type:
numpy.float64
Notes
The Genton estimator is described in great detail in the original
publication [5] and is defined as:
Q_{N_h} = 2.2191\{|V_i(h) - V_j(h)|; i < j\}_{(k)}
and
k = \binom{[N_h / 2] + 1}{2}
and
q = \binom{N_h}{2}
where k is the kth quantile of all q point pairs. For large N (k/q) will be
close to 0.25. For N >= 500, (k/q) is close to 0.25 by two decimals and
will therefore be set to 0.5 and the two binomial coefficients k,
q are not calculated.
Calculates the Shannon Entropy H as a variogram estimator. It is highly
recommended to calculate the bins and explicitly set them as a list.
In case this function is called for more than one lag class in a
variogram, setting bins to None would result in different bin edges in
each lag class. This would be very difficult to interpret.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
bins (int, list, str) – list of the bin edges used to calculate the empirical distribution of x.
If bins is a list, these values are used directly. In case bins is a
integer, as many even width bins will be calculated between the
minimum and maximum value of x. In case bins is a string, it will be
passed as bins argument to numpy.histograms function.
Returns:
entropy – Shannon entropy of the given pairwise differences.
This is an experimental semi-variance estimator. It is heavily influenced
by extreme values and outliers. That behaviour is usually not desired in
geostatistics.
Returns a custom value. This estimator is the difference of maximum and
minimum pairwise differences, normalized by the mean. MinMax will be very
sensitive to extreme values.
Do only use this estimator, in case you know what you are doing. It is
experimental and might change its behaviour in a future version.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
This is an experimental semi-variance estimator. It uses just a
percentile of the given pairwise differences and does not bear any
information about their variance.
Returns a given percentile as semi-variance. Do only use this estimator,
in case you know what you are doing.
Do only use this estimator, in case you know what you are doing. It is
experimental and might change its behaviour in a future version.
Parameters:
x (numpy.ndarray) – Array of pairwise differences. These values should be the distances
between pairwise observations in value space. If xi and x[i+h] fall
into the h separating distance class, x should contain abs(xi - x[i+h])
as an element.
p (int) – Desired percentile. Should be given as whole numbers 0 < p < 100.