Variogram Class#

class skgstat.Variogram(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]#

Variogram Class

Calculates a variogram of the separating distances in the given coordinates and relates them to one of the semi-variance measures of the given dependent values.

__init__(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]#

Variogram Class

Parameters:
  • coordinates (numpy.ndarray, MetricSpace) –

    Changed in version 0.5.0: now accepts MetricSpace

    Array of shape (m, n). Will be used as m observation points of n-dimensions. This variogram can be calculated on 1 - n dimensional coordinates. In case a 1-dimensional array is passed, a second array of same length containing only zeros will be stacked to the passed one. For very large datasets, you can set maxlag to only calculate distances within the maximum lag in a sparse matrix. Alternatively you can supply a MetricSpace (optionally with a max_dist set for the same effect). This is useful if you’re creating many different variograms for different measured parameters that are all measured at the same set of coordinates, as distances will only be calculated once, instead of once per variogram.

  • values (numpy.ndarray) –

    Changed in version 1.0.5: Now accepts co-variables for calculating cross variograms.

    Array of values observed at the given coordinates. The length of the values array has to match the m dimension of the coordinates array. Will be used to calculate the dependent variable of the variogram. If the values are of shape (n_samples, 2), a cross-variogram will be calculated. This assumes the main variable and the co-variable to be co-located under Markov-model 1 assumptions, meaning the variable need to be conditionally independent.

  • estimator (str, callable) –

    String identifying the semi-variance estimator to be used. Defaults to the Matheron estimator. Possible values are:

    • matheron [Matheron, default]

    • cressie [Cressie-Hawkins]

    • dowd [Dowd-Estimator]

    • genton [Genton]

    • minmax [MinMax Scaler]

    • entropy [Shannon Entropy]

    If a callable is passed, it has to accept an array of absolute differences, aligned to the 1D distance matrix (flattened upper triangle) and return a scalar, that converges towards small values for similarity (high covariance).

  • model (str | Callable) –

    Changed in version 1.0.12: Added support for sum of models (e.g., “spherical+gaussian”), or custom model (Callable). Using fit_bounds to optimize the fit is recommended for custom models, and can be useful for sum of models.

    String or callable identifying the theoretical variogram function to be used to describe the experimental variogram. Can be one of:

    • spherical [Spherical, default]

    • exponential [Exponential]

    • gaussian [Gaussian]

    • cubic [Cubic]

    • stable [Stable model]

    • matern [Matérn model]

    • nugget [nugget effect variogram]

    Any number of these theoretical models can be summed using “+” iteratively, e.g. “spherical+cubic+matern”. The nugget parameters of the models are removed except for the last model (sum of nuggets = single nugget).

  • dist_func (str) – String identifying the distance function. Defaults to ‘euclidean’. Can be any metric accepted by scipy.spatial.distance.pdist. Additional parameters are not (yet) passed through to pdist. These are accepted by pdist for some of the metrics. In these cases the default values are used.

  • bin_func (str | Callable | Iterable) –

    Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’

    Changed in version 0.3.9: added ‘kmeans’, ‘ward’

    String identifying the binning function used to find lag class edges. All methods calculate bin edges on the interval [0, maxlag[. Possible values are:

    • 'even' (default) finds n_lags same width bins

    • 'uniform' forms n_lags bins of same data count

    • 'fd' applies Freedman-Diaconis estimator to find n_lags

    • 'sturges' applies Sturge’s rule to find n_lags.

    • 'scott' applies Scott’s rule to find n_lags

    • 'doane' applies Doane’s extension to Sturge’s rule to find n_lags

    • 'sqrt' uses the square-root of distance as n_lags.

    • 'kmeans' uses KMeans clustering to well supported bins

    • 'ward' uses hierarchical clustering to find minimum-variance clusters.

    More details are given in the documentation for set_bin_func.

  • normalize (bool) – Defaults to False. If True, the independent and dependent variable will be normalized to the range [0,1].

  • fit_method (str | None) –

    Changed in version 0.3.10: Added ‘ml’ and ‘custom’

    String identifying the method to be used for fitting the theoretical variogram function to the experimental. If None is passed, the fit does not run. More info is given in the Variogram.fit docs. Can be one of:

    • ’lm’: Levenberg-Marquardt algorithm for unconstrained problems. This is the faster algorithm, yet is the fitting of a variogram not unconstrianed.

    • ’trf’: Trust Region Reflective function for non-linear constrained problems. The class will set the boundaries itself. This is the default function.

    • ’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.

    • ’manual’: Manual fitting. You can set the range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation.

  • fit_sigma (numpy.ndarray, str) –

    Defaults to None. The sigma is used as measure of uncertainty during variogram fit. If fit_sigma is an array, it has to hold n_lags elements, giving the uncertainty for all lags classes. If fit_sigma is None (default), it will give no weight to any lag. Higher values indicate higher uncertainty and will lower the influcence of the corresponding lag class for the fit. If fit_sigma is a string, a pre-defined function of separating distance will be used to fill the array. Can be one of:

    • ’linear’: Linear loss with distance. Small bins will have higher impact.

    • ’exp’: The weights decrease by a e-function of distance

    • ’sqrt’: The weights decrease by the squareroot of distance

    • ’sq’: The weights decrease by the squared distance.

    More info is given in the Variogram.fit_sigma documentation.

  • use_nugget (bool) – Defaults to False. If True, a nugget effet will be added to all Variogram.models as a third (or fourth) fitting parameter. A nugget is essentially the y-axis interception of the theoretical variogram function. For a sum of variogram, the nugget is defined in its last model.

  • maxlag (float, str) – Can specify the maximum lag distance directly by giving a value larger than 1. The binning function will not find any lag class with an edge larger than maxlag. If 0 < maxlag < 1, then maxlag is relative and maxlag * max(Variogram.distance) will be used. In case maxlag is a string it has to be one of ‘median’, ‘mean’. Then the median or mean of all Variogram.distance will be used. Note maxlag=0.5 will use half the maximum separating distance, this is not the same as ‘median’, which is the median of all separating distances

  • samples (float, int) – If set to a non-None value point pairs are sampled randomly. Two random subset of all points are chosen, and the distance matrix is calculated only between these two subsets. The size of each subset is set by samples: if < 1 it specifies a fraction of all points, if >= 1 it specifies the number of points in each subset.

  • n_lags (int) – Specify the number of lag classes to be defined by the binning function.

  • verbose (bool) – Set the Verbosity of the class. Not Implemented yet.

Keyword Arguments:
  • entropy_bins (int, str) –

    Added in version 0.3.7.

    If the estimator <skgstat.Variogram.estimator> is set to 'entropy' this argument sets the number of bins, that should be used for histogram calculation.

  • percentile (int) –

    Added in version 0.3.7.

    If the estimator <skgstat.Variogram.estimator> is set to 'entropy' this argument sets the percentile to be used.

  • binning_random_state (int, None) –

    Added in version 0.3.9.

    If bin_func is 'kmeans' this can overwrite the seed for the initial guess of the cluster centroids. Note, that K-Means is not deterministic and is therefore seeded to 42 here. You can pass None to disable this behavior, but use it with care, as you will get different results.

  • binning_agg_func (str) –

    Added in version 0.3.10.

    If bin_func is 'ward' this keyword argument can switch from default mean aggregation to median aggregation for calculating the cluster centroids.

  • obs_sigma (int, float) –

    Added in version 0.6.0.

    If set, the Variogram will use this sigma as the standard deviation of the observations passed as values. Using a MonteCarlo simulation the uncertainties are propagated into the experimental variogram. If present, the plot will indicate the confidence interval as error bars around the experimental variogram.

  • fit_bounds (2-tuple of array_like or Bounds, optional) –

    Added in version 1.0.12.

    Lower and upper bounds on parameters passed to scipy.optimize.curve_fit.

    Order is typically (range, sill, nugget) or (range, sill, smoothness, nugget) for individual models, or (range1, sill1, nugget1, range2, sill2, nugget2) for a sum of 2 models. Recommended for custom models, where bounds cannot be determined logically. For internal models, defaults to known min/max values for the sill (0, max variance), range (0, max lag) and smoothness (0, 2) or (0, 20) for stable and matern, respectively.

  • fit_p0 (array_like, optional) –

    Added in version 1.0.12.

    Initial guess for the parameters passed to scipy.optimize.curve_fit.

    Same order as for fit_bounds. Defaults to upper bounds values. For custom models, if no bounds are defined, defaults to 1.

__init__(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]#

Variogram Class

Parameters:
  • coordinates (numpy.ndarray, MetricSpace) –

    Changed in version 0.5.0: now accepts MetricSpace

    Array of shape (m, n). Will be used as m observation points of n-dimensions. This variogram can be calculated on 1 - n dimensional coordinates. In case a 1-dimensional array is passed, a second array of same length containing only zeros will be stacked to the passed one. For very large datasets, you can set maxlag to only calculate distances within the maximum lag in a sparse matrix. Alternatively you can supply a MetricSpace (optionally with a max_dist set for the same effect). This is useful if you’re creating many different variograms for different measured parameters that are all measured at the same set of coordinates, as distances will only be calculated once, instead of once per variogram.

  • values (numpy.ndarray) –

    Changed in version 1.0.5: Now accepts co-variables for calculating cross variograms.

    Array of values observed at the given coordinates. The length of the values array has to match the m dimension of the coordinates array. Will be used to calculate the dependent variable of the variogram. If the values are of shape (n_samples, 2), a cross-variogram will be calculated. This assumes the main variable and the co-variable to be co-located under Markov-model 1 assumptions, meaning the variable need to be conditionally independent.

  • estimator (str, callable) –

    String identifying the semi-variance estimator to be used. Defaults to the Matheron estimator. Possible values are:

    • matheron [Matheron, default]

    • cressie [Cressie-Hawkins]

    • dowd [Dowd-Estimator]

    • genton [Genton]

    • minmax [MinMax Scaler]

    • entropy [Shannon Entropy]

    If a callable is passed, it has to accept an array of absolute differences, aligned to the 1D distance matrix (flattened upper triangle) and return a scalar, that converges towards small values for similarity (high covariance).

  • model (str | Callable) –

    Changed in version 1.0.12: Added support for sum of models (e.g., “spherical+gaussian”), or custom model (Callable). Using fit_bounds to optimize the fit is recommended for custom models, and can be useful for sum of models.

    String or callable identifying the theoretical variogram function to be used to describe the experimental variogram. Can be one of:

    • spherical [Spherical, default]

    • exponential [Exponential]

    • gaussian [Gaussian]

    • cubic [Cubic]

    • stable [Stable model]

    • matern [Matérn model]

    • nugget [nugget effect variogram]

    Any number of these theoretical models can be summed using “+” iteratively, e.g. “spherical+cubic+matern”. The nugget parameters of the models are removed except for the last model (sum of nuggets = single nugget).

  • dist_func (str) – String identifying the distance function. Defaults to ‘euclidean’. Can be any metric accepted by scipy.spatial.distance.pdist. Additional parameters are not (yet) passed through to pdist. These are accepted by pdist for some of the metrics. In these cases the default values are used.

  • bin_func (str | Callable | Iterable) –

    Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’

    Changed in version 0.3.9: added ‘kmeans’, ‘ward’

    String identifying the binning function used to find lag class edges. All methods calculate bin edges on the interval [0, maxlag[. Possible values are:

    • 'even' (default) finds n_lags same width bins

    • 'uniform' forms n_lags bins of same data count

    • 'fd' applies Freedman-Diaconis estimator to find n_lags

    • 'sturges' applies Sturge’s rule to find n_lags.

    • 'scott' applies Scott’s rule to find n_lags

    • 'doane' applies Doane’s extension to Sturge’s rule to find n_lags

    • 'sqrt' uses the square-root of distance as n_lags.

    • 'kmeans' uses KMeans clustering to well supported bins

    • 'ward' uses hierarchical clustering to find minimum-variance clusters.

    More details are given in the documentation for set_bin_func.

  • normalize (bool) – Defaults to False. If True, the independent and dependent variable will be normalized to the range [0,1].

  • fit_method (str | None) –

    Changed in version 0.3.10: Added ‘ml’ and ‘custom’

    String identifying the method to be used for fitting the theoretical variogram function to the experimental. If None is passed, the fit does not run. More info is given in the Variogram.fit docs. Can be one of:

    • ’lm’: Levenberg-Marquardt algorithm for unconstrained problems. This is the faster algorithm, yet is the fitting of a variogram not unconstrianed.

    • ’trf’: Trust Region Reflective function for non-linear constrained problems. The class will set the boundaries itself. This is the default function.

    • ’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.

    • ’manual’: Manual fitting. You can set the range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation.

  • fit_sigma (numpy.ndarray, str) –

    Defaults to None. The sigma is used as measure of uncertainty during variogram fit. If fit_sigma is an array, it has to hold n_lags elements, giving the uncertainty for all lags classes. If fit_sigma is None (default), it will give no weight to any lag. Higher values indicate higher uncertainty and will lower the influcence of the corresponding lag class for the fit. If fit_sigma is a string, a pre-defined function of separating distance will be used to fill the array. Can be one of:

    • ’linear’: Linear loss with distance. Small bins will have higher impact.

    • ’exp’: The weights decrease by a e-function of distance

    • ’sqrt’: The weights decrease by the squareroot of distance

    • ’sq’: The weights decrease by the squared distance.

    More info is given in the Variogram.fit_sigma documentation.

  • use_nugget (bool) – Defaults to False. If True, a nugget effet will be added to all Variogram.models as a third (or fourth) fitting parameter. A nugget is essentially the y-axis interception of the theoretical variogram function. For a sum of variogram, the nugget is defined in its last model.

  • maxlag (float, str) – Can specify the maximum lag distance directly by giving a value larger than 1. The binning function will not find any lag class with an edge larger than maxlag. If 0 < maxlag < 1, then maxlag is relative and maxlag * max(Variogram.distance) will be used. In case maxlag is a string it has to be one of ‘median’, ‘mean’. Then the median or mean of all Variogram.distance will be used. Note maxlag=0.5 will use half the maximum separating distance, this is not the same as ‘median’, which is the median of all separating distances

  • samples (float, int) – If set to a non-None value point pairs are sampled randomly. Two random subset of all points are chosen, and the distance matrix is calculated only between these two subsets. The size of each subset is set by samples: if < 1 it specifies a fraction of all points, if >= 1 it specifies the number of points in each subset.

  • n_lags (int) – Specify the number of lag classes to be defined by the binning function.

  • verbose (bool) – Set the Verbosity of the class. Not Implemented yet.

Keyword Arguments:
  • entropy_bins (int, str) –

    Added in version 0.3.7.

    If the estimator <skgstat.Variogram.estimator> is set to 'entropy' this argument sets the number of bins, that should be used for histogram calculation.

  • percentile (int) –

    Added in version 0.3.7.

    If the estimator <skgstat.Variogram.estimator> is set to 'entropy' this argument sets the percentile to be used.

  • binning_random_state (int, None) –

    Added in version 0.3.9.

    If bin_func is 'kmeans' this can overwrite the seed for the initial guess of the cluster centroids. Note, that K-Means is not deterministic and is therefore seeded to 42 here. You can pass None to disable this behavior, but use it with care, as you will get different results.

  • binning_agg_func (str) –

    Added in version 0.3.10.

    If bin_func is 'ward' this keyword argument can switch from default mean aggregation to median aggregation for calculating the cluster centroids.

  • obs_sigma (int, float) –

    Added in version 0.6.0.

    If set, the Variogram will use this sigma as the standard deviation of the observations passed as values. Using a MonteCarlo simulation the uncertainties are propagated into the experimental variogram. If present, the plot will indicate the confidence interval as error bars around the experimental variogram.

  • fit_bounds (2-tuple of array_like or Bounds, optional) –

    Added in version 1.0.12.

    Lower and upper bounds on parameters passed to scipy.optimize.curve_fit.

    Order is typically (range, sill, nugget) or (range, sill, smoothness, nugget) for individual models, or (range1, sill1, nugget1, range2, sill2, nugget2) for a sum of 2 models. Recommended for custom models, where bounds cannot be determined logically. For internal models, defaults to known min/max values for the sill (0, max variance), range (0, max lag) and smoothness (0, 2) or (0, 20) for stable and matern, respectively.

  • fit_p0 (array_like, optional) –

    Added in version 1.0.12.

    Initial guess for the parameters passed to scipy.optimize.curve_fit.

    Same order as for fit_bounds. Defaults to upper bounds values. For custom models, if no bounds are defined, defaults to 1.

property coordinates#

Coordinates property

Array of observation locations the variogram is build for. This property has no setter. If you want to change the coordinates, use a new Variogram instance.

Returns:

coordinates

Return type:

numpy.array

property metric_space#

Added in version 0.5.6.

MetricSpace representation of the input coordinates. A MetricSpace can be used to pass pre-calculated coordinates to other Variogram instances.

Returns:

metric_space

Return type:

skgstat.MetricSpace

See also

Variogram.coordinates

coordinate representation

property dim#

Input coordinates dimensionality.

property values#

Values property

Array of observations, the variogram is build for. The setter of this property utilizes the Variogram.set_values function for setting new arrays.

Returns:

values

Return type:

numpy.ndarray

property value_matrix#

Value matrix

Returns a matrix of pairwise differences in absolute values. The matrix will have the shape (m, m) with m = len(Variogram.values). Note that Variogram.values holds the values themselves, while the value_matrix consists of their pairwise differences.

Returns:

values – Matrix of pairwise absolute differences of the values.

Return type:

numpy.matrix

See also

Variogram._diff

set_values(values, calc_diff=True)[source]#

Set new values

Will set the passed array as new value array. This array has to be of same length as the first axis of the coordinates array. The Variogram class does only accept one dimensional arrays. On success all fitting parameters are deleted and the pairwise differences are recalculated. Raises :py:class:`ValueError`s on shape mismatches and a Warning

Changed in version Now: a warnings.warn message is thrown if all input data is the same

Parameters:

values (numpy.ndarray)

Return type:

void

:raises ValueError : raised if the values array shape does not match the: coordinates array, or more than one dimension given :raises Warning : raised if all input values are the same:

See also

Variogram.values

property pairwise_diffs#

Added in version 1.0.4.

Pairwise residual differences of the input data. The property should be used over the Variogram._diff attribute, as this will contain multiple targets with future releases to implement cross-variograms.

property bin_func#

Binning function

Returns an instance of the function used for binning the separating distances into the given amount of bins. Both functions use the same signature of func(distances, n, maxlag).

The setter of this property utilizes the Variogram.set_bin_func to set a new function.

Returns:

binning_function

Return type:

function

set_bin_func(bin_func: str | Iterable | Callable[[ndarray, float, float], Tuple[ndarray, float]])[source]#

Set binning function

Sets a new binning function to be used. The new binning method is set by either a string identifying the new function to be used, or an iterable containing the bin edges, or any function that can compute bins from the distances, number of lags and maximum lag. The string can be one of: [‘even’, ‘uniform’, ‘fd’,

‘sturges’, ‘scott’, ‘sqrt’, ‘doane’].

If the number of lag classes should be estimated automatically, it is recommended to use ‘ sturges’ for small, normal distributed locations and ‘fd’ or ‘scott’ for large datasets, where ‘fd’ is more robust to outliers. ‘sqrt’ is by far the fastest estimator. ‘doane’ is an extension of Sturge’s rule for non-normal distributed data.

Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’

Changed in version 0.3.9: added ‘kmeans’, ‘ward’

Changed in version 0.4.0: added ‘stable_entropy’

Changed in version 0.4.1: refactored local wrapper function definition. The wrapper to pass kwargs to the binning functions is now implemented as a instance method, to make it pickleable.

Changed in version 0.6.5: added iterable and function as arguments to allow for custom bins.

Parameters:

bin_func (str | Iterable | Callable) –

Can be one of:

  • ’even’

  • ’uniform’

  • ’fd’

  • ’sturges’

  • ’scott’

  • ’sqrt’

  • ’doane’

  • ’kmeans’

  • ’ward’

  • ’stable_entropy’

Return type:

void

Notes

`’even’`: Use skgstat.binning.even_width_lags for using n_lags lags of equal width up to maxlag.

`’uniform’`: Use skgstat.binning.uniform_count_lags for using n_lags lags up to maxlag in which the pairwise differences follow a uniform distribution.

`’sturges’`: estimates the number of evenly distributed lag classes (n) by Sturges rule [101]:

n = log_2 n + 1

`’scott’`: estimates the lag class widths (h) by Scott’s rule [102]:

h = \sigma \frac{24 * \sqrt{\pi}}{n}^{\frac{1}{3}}

`’sqrt’`: estimates the number of lags (n) by the suare-root:

n = \sqrt{n}

`’fd’`: estimates the lag class widths (h) using the Freedman Diaconis estimator [103]:

h = 2\frac{IQR}{n^{1/3}}

`’doane’`: estimates the number of evenly distributed lag classes using Doane’s extension to Sturge’s rule [104]:

n = 1 + \log_{2}(s) + \log_2\left(1 + \frac{|g|}{k}\right) g = E\left[\left(\frac{x - \mu_g}{\sigma}\right)^3\right] k = \sqrt{\frac{6(s - 2)}{(s + 1)(s + 3)}}

`’kmeans’`: This method will search for n clusters in the distance matrix. The cluster centroids are used to calculate the upper edges of the lag classes, by setting it to half of the distance between two neighboring clusters. Note: This does not necessarily result in even width bins.

`’ward’` uses a hierarchical culstering algorithm to iteratively merge pairs of clusters until there are only n remaining clusters. The merging is done by minimizing the variance for the merged cluster.

`’stable_entropy’` will adjust n bin edges by minimizing the absolute differences between each lag’s Shannon Entropy. This will lead to uneven bin widths. Each lag class value distribution will be of comparable intrinsic uncertainty from an information theoretic point of view, which makes the semi-variances quite comparable. However, it is not guaranteed, that the binning makes any sense from a geostatistical point of view, as the first lags might be way too wide.

References

property normalized#
property bins#

Distance lag bins

Independent variable of the the experimental variogram sample. The bins are the upper edges of all calculated distance lag classes. If you need bin centers, use get_empirical.

Returns:

bins – 1D array of the distance lag classes.

Return type:

numpy.ndarray

property n_lags#

Number of lag bins

Pass the number of lag bins to be used on this Variogram instance. This will reset the grouping index and fitting parameters

property bin_count#
property estimator#
set_estimator(estimator_name)[source]#
property model#
set_model(model_name)[source]#

Set model as the new theoretical variogram function.

property use_nugget#

Use a nugget effect on this Variogram instance. If disabled, the automatic fitting procedures will omit the nugget and not use it as a model parameter.

Note

If fit_method is set to 'manual' and a nugget parameter is pass to fit, use_nugget will be set to True.

Returns:

use_nugget

Return type:

bool

classmethod wrapped_distance_function(dist_func, x, **kwargs)[source]#
property dist_function#
set_dist_function(func)[source]#

Set distance function

Set the function used for distance calculation. func can either be a callable or a string. The ranked distance function is not implemented yet. strings will be forwarded to the scipy.spatial.distance.pdist function as the metric argument. If func is a callable, it has to return the upper triangle of the distance matrix as a flat array (Like the pdist function).

Parameters:

func (string, callable)

Return type:

numpy.array

property distance#
property triangular_distance_matrix#

Like distance_matrix but with zeros below the diagonal… Only defined if distance_matrix is a sparse matrix

property distance_matrix#
property maxlag#

Maximum lag distance to be considered in this Variogram instance. You can limit the distance at which point pairs are calculated. There are three possible ways how to do that, in absolute lag units, which is a number larger one. Secondly, a number 0 < maxlag < 1 can be set, which will use this share of the maximum distance as maxlag. Lastly, a string can be set: 'mean' and 'median' for the mean or median value of the distance matrix.

Notes

This setting is largely flexible, but all options except the absolute limit in lag units need the full distance matrix to be calculated. Hence, it does not speed up the calculation of large distance matrices, just the estimation of the variogram. Thus, if you pre-calcualte the distance matrix using MetricSpace, only absolute limits can be used.

property fit_method#

Added in version 0.6.2.

Set the fit method to be used for this Variogram instance. Possible values are:

  • 'trf' - Trust-Region Reflective (default)

  • 'lm' - Levenberg-Marquardt

  • 'ml' - Maximum Likelihood estimation

  • 'manual'` - Manual fitting by setting the parameters

Changed in version 0.6.6: Passing None will prevent the fitting procedure from running.

Notes

The default method (TRF) is a bounded least squares method, that sets constraints to the value space of all parameters. All methods use an initial guess for all used parameters. This is max(bins) for the range, max(experimental) for the sill, 20 for the Matérn smoothness, 2 for the stable model shape and 1 for the nugget if used.

property fit_sigma#

Fitting Uncertainty

Set or calculate an array of observation uncertainties aligned to the Variogram.bins. These will be used to weight the observations in the cost function, which divides the residuals by their uncertainty.

When setting fit_sigma, the array of uncertainties itself can be given, or one of the strings: [‘linear’, ‘exp’, ‘sqrt’, ‘sq’, ‘entropy’]. The parameters described below refer to the setter of this property.

Changed in version 0.3.11: added the ‘entropy’ option.

Parameters:

sigma (string, array) –

Sigma can either be an array of discrete uncertainty values, which have to align to the Variogram.bins, or of type string. Then, the weights for fitting are calculated as a function of (lag) distance.

  • sigma=’linear’: The residuals get weighted by the lag distance normalized to the maximum lag distance, denoted as w_n

  • sigma=’exp’: The residuals get weighted by the function: w = e^{1 / w_n}

  • sigma=’sqrt’: The residuals get weighted by the function: w = \sqrt(w_n)

  • sigma=’sq’: The residuals get weighted by the function: w = w_n^2

  • sigma=’entropy’: Calculates the Shannon Entropy as intrinsic uncertainty of each lag class.

Return type:

void

Notes

The cost function is defined as:

chisq = \sum {\frac{r}{\sigma}}^2

where r are the residuals between the experimental variogram and the modeled values for the same lag. Following this function, small values will increase the influence of that residual, while a very large sigma will cause the observation to be ignored.

property is_cross_variogram: bool#

Read-only flag indicating if the current instance is a cross-variogram

update_kwargs(**kwargs)[source]#

Added in version 0.3.7.

Update the keyword arguments of this Variogram instance. The keyword arguments will be validated first and the update the existing kwargs. That means, you can pass only the kwargs, which need to be updated.

Note

Updating the kwargs does not force a preprocessing circle. Any affected intermediate result, that might be cached internally, will not make use of updated kwargs. Make a call to preprocessing(force=True) to force a clean re-calculation of the Variogram instance.

lag_groups()[source]#

Lag class groups

Returns a mask array with as many elements as self._diff has, identifying the lag class group for each pairwise difference. Can be used to extract all pairwise values within the same lag bin.

Return type:

numpy.ndarray

lag_classes()[source]#

Iterate over the lag classes

Generates an iterator over all lag classes. Can be zipped with Variogram.bins to identify the lag.

Changed in version 0.3.6: yields an empty array for empty lag groups now

Return type:

iterable

preprocessing(force=False)[source]#

Preprocessing function

Prepares all input data for the fit and transform functions. Namely, the distances are calculated and the value differences. Then the binning is set up and bin edges are calculated. If any of the listed subsets are already prepared, their processing is skipped. This behaviour can be changed by the force parameter. This will cause a clean preprocessing.

Parameters:

force (bool) – If set to True, all preprocessing data sets will be deleted. Use it in case you need a clean preprocessing.

Return type:

void

fit(force=False, method=None, sigma=None, bounds=None, p0=None, **kwargs)[source]#

Fit the variogram

The fit function will fit the theoretical variogram function to the experimental. The preprocessed distance matrix, pairwise differences and binning will not be recalculated, if already done. This could be forced by setting the force parameter to true.

In case you call fit function directly, with method or sigma, the parameters set on Variogram object instantiation will get overwritten. All other keyword arguments will be passed to scipy.optimize.curve_fit function.

Changed in version 0.3.10: added ‘ml’ and ‘custom’ method.

Changed in version 1.0.1: use_nugget is now flagged implicitly, whenever a nugget > 0 is passed in manual fitting.

Parameters:
  • force (bool) – If set to True, a clean preprocessing of the distance matrix, pairwise differences and the binning will be forced. Default is False.

  • method (string) –

    A string identifying one of the implemented fitting procedures. Can be one of:

    • lm: Levenberg-Marquardt algorithms implemented in scipy.optimize.leastsq function.

    • trf: Trust Region Reflective algorithm implemented in scipy.optimize.least_squares(method=’trf’)

    • ’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.

    • ’manual’: Manual fitting. You can set the range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation.

sigmastring, array

Uncertainty array for the bins. Has to have the same dimension as self.bins. Refer to Variogram.fit_sigma for more information.

bounds: 2-tuple of array_like or Bounds, optional

Lower and upper bounds on parameters passed to scipy.optimize.curve_fit.

Order is typically (range, sill, nugget) or (range, sill, smoothness, nugget) for individual models, or (range1, sill1, nugget1, range2, sill2, nugget2) for a sum of 2 models. Recommended for custom models, where bounds cannot be determined logically. For internal models, defaults to known min/max values for the sill (0, max variance), range (0, max lag) and smoothness (0, 2) or (0, 20) for stable and matern, respectively.

p0: array_like, optional

Initial guess for the parameters passed to scipy.optimize.curve_fit.

Same order as for fit_bounds. Defaults to upper bounds values. For custom models, if no bounds are defined, defaults to 1.

Return type:

void

transform(x)[source]#

Transform

Transform a given set of lag values to the theoretical variogram function using the actual fitting and preprocessing parameters in this Variogram instance

Parameters:

x (numpy.array) – Array of lag values to be used as model input for the fitted theoretical variogram model

Return type:

numpy.array

property fitted_model#

Fitted Model

Returns a callable that takes a distance value and returns a semivariance. This model is fitted to the current Variogram parameters. The function will be interpreted at return time with the parameters hard-coded into the function code.

Returns:

model – The current semivariance model fitted to the current Variogram model parameters.

Return type:

callable

classmethod fitted_model_function(model, cof=None, **kw)[source]#
clone()[source]#

Deep copy of self

Return a deep copy of self.

Return type:

Variogram

property experimental#

Experimental Variogram

Array of experimental (empirical) semivariance values. The array length will be aligned to Variogram.bins. The current Variogram.estimator has been used to calculate the values. Depending on the setting of Variogram.harmonize (True | False), either Variogram._experimental or Variogram.isotonic will be returned.

Returns:

vario – Array of the experimental semi-variance values aligned to Variogram.bins.

Return type:

numpy.ndarray

See also

Variogram._experimental, Variogram.isotonic

get_empirical(bin_center=False)[source]#

Empirical variogram

Returns a tuple of dependent and independent sample values, this Variogram is estimated for. This is a tuple of the current bins and experimental semi-variance values. By default the upper bin edges are used. This can be set to bin center by the bin_center argument.

Parameters:

bin_center (bool) – If set to True, the center for each distance lag bin is used over the upper limit (default).

Returns:

  • bins (numpy.ndarray) – 1D array of n_lags distance lag bins.

  • experimental (numpy.ndarray) – 1D array of n_lags experimental semi-variance values.

data(n=100, force=False)[source]#

Theoretical variogram function

Calculate the experimental variogram and apply the binning. On success, the variogram model will be fitted and applied to n lag values. Returns the lags and the calculated semi-variance values. If force is True, a clean preprocessing and fitting run will be executed.

Parameters:
  • n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots

  • force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False

Returns:

variogram – first element is the created lags array second element are the calculated semi-variance values

Return type:

tuple

property residuals#

Model residuals

Calculate the model residuals defined as the differences between the experimental variogram and the theoretical model values at corresponding lag values

Deprecated since version 1.0.4: residuals can be ambiguous, thus the property is renamed to model_residuals

Return type:

numpy.ndarray

property model_residuals: ndarray#

Calculate the model residuals defined as the differences between the experimental variogram and the theoretical model values at corresponding lag values.

Returns:

residuals

Return type:

numpy.ndarray

property mean_residual#

Mean Model residuals

Calculates the mean, absolute deviations between the experimental variogram and theretical model values.

Return type:

float

property rmse#

RMSE

Calculate the Root Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Return type:

float

Notes

The RMSE is implemented like:

RMSE = \sqrt{\frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}}

property mse#

MSE

Calculate the Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Return type:

float

Notes

The MSE is implemented like:

MSE = \frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}

property mae#

RMSE

Calculate the Mean absolute error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Return type:

float

Notes

The MAE is implemented like:

MAE = \frac{\sum_{i=0}^{i=N(x)} |x-y|}{N(x)}

property nrmse#

NRMSE

Calculate the normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure

Return type:

float

Notes

The NRMSE is implemented as:

NRMSE = \frac{RMSE}{mean(y)}

where RMSE is Variogram.rmse and y is Variogram.experimental

property root_mean_square#

Root Mean Square (RMS) of the residuals

Calculates the square root of the mean of squared residuals.

Returns:

Root Mean Square of the residuals.

Return type:

float

property residual_sum_of_squares#

Residual Sum of Squares (RSS)

Calculates the sum of squared differences between the experimental variogram and theoretical model values.

Returns:

Residual sum of squares (RSS), a measure of the overall model fit representing the sum of squared deviations between the observed experimental variogram and the corresponding theoretical model values.

Return type:

float

property rss#
property nrmse_r#

NRMSE

Alternative normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Return type:

float

Notes

Unlike Variogram.nrmse, nrmse_r is not normalized to the mean of y, but the difference of the maximum y to its mean:

NRMSE_r = \frac{RMSE}{max(y) - mean(y)}

property r#

Pearson correlation of the fitted Variogram

Returns:

property NS#

Nash Sutcliffe efficiency of the fitted Variogram

Returns:

property aic: float#
property bic: float#
model_deviations()[source]#

Model Deviations

Calculate the deviations between the experimental variogram and the recalculated values for the same bins using the fitted theoretical variogram function. Can be utilized to calculate a quality measure for the variogram fit.

Returns:

deviations – first element is the experimental variogram second element are the corresponding values of the theoretical model.

Return type:

tuple

cross_validate(method: str = 'jacknife', n: int = None, metric: str = 'rmse', seed=None) float[source]#

Cross validation of the variogram model by means of Kriging. Right now, this function can only utilize a jacknife (leave-one-out) cross validation and will only use the builtin OrdinaryKriging method (not yet the to_gs_krige interface).

Parameters:
  • method (str) – Right now, ‘jacknife’ is the only possible input.

  • n (int) – The number of points to be included into the cross-validation. If None (default), all points will be used.

  • metric (str) – Metric used for cross-validation. Can be root mean square error (rmse), mean squared error (mse) or mean absolute error (mae).

  • seed (int) – If n is not None, the random selection of input data for the cross-validation can be seeded.

Returns:

metric – The cross-validation result as specified above.

Return type:

float

describe(short=False, flat=False)[source]#

Variogram parameters

Return a dictionary of the variogram parameters.

Changed in version 0.3.7: The describe now returns all init parameters in as the describe()['params'] key and all keyword arguments as describe()['kwargs']. This output can be suppressed by setting short=True.

Parameters:
  • short (bool) – If True, the 'params' and 'kwargs' keys will be omitted. Defaults to False.

  • flat (bool) – If True, the 'params' and 'kwargs' nested dict`s will be distributed to the main `dict to return a flat dict. Defaults to False

Returns:

parameters – Returns fitting parameters of the theoretical variogram model along with the init parameters of the Variogram <skgstat.Variogram> instance.

Return type:

dict

property parameters#

Extract just the variogram parameters range, sill and nugget from the describe output.

Returns:

params – [range, sill, nugget] for most models and [range, sill, shape, nugget] for matern and stable model. [range1, sill1, nugget1, range2, still2, nugget2] for a sum of 2 models. [param1, param2, param3, …] in order for a custom model.

Return type:

list

to_DataFrame(n=100, force=False)[source]#

Variogram DataFrame

Returns the fitted theoretical variogram as a pandas.DataFrame instance. The n and force parameter control the calculation, refer to the data function for more info.

Deprecated since version 1.0.10: The return value of this function will change with a future release

Parameters:
  • n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots

  • force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False

Return type:

pandas.DataFrame

See also

Variogram.data

to_gstools(**kwargs)[source]#

Instantiate a corresponding GSTools CovModel.

By default, this will be an isotropic model.

Parameters:

**kwargs – Keyword arguments forwarded to the instantiated GSTools CovModel. The default parameters ‘dim’, ‘var’, ‘len_scale’, ‘nugget’, ‘rescale’ and optional shape parameters will be extracted from the given Variogram but they can be overwritten here.

Raises:
  • ImportError – When GSTools is not installed.

  • ValueError – When GSTools version is not v1.3 or greater.

  • ValueError – When given Variogram model is not supported (‘harmonize’).

Warns:

Warning – If the Variogram is a cross-variogram

Returns:

Corresponding GSTools covmodel.

Return type:

CovModel

Note

In case you intend to use the coordinates in a GSTools workflow, you need to transpose the coordinate array like:

>> cond_pos Variogram.coordinates.T

to_gs_krige(**kwargs)[source]#

Instantiate a GSTools Krige class.

This can only export isotropic models. Note: the fit_variogram is always set to False

Parameters:
  • variogram (skgstat.Variogram) – Scikit-GStat Variogram instamce

  • **kwargs – Keyword arguments forwarded to GSTools Krige. Refer to Krige to learn about all possible options. Note that the fit_variogram parameter will always be False.

Raises:
  • ImportError – When GSTools is not installed.

  • ValueError – When GSTools version is not v1.3 or greater.

  • ValueError – When given Variogram model is not supported (‘harmonize’).

Warns:

Warning – If the Variogram is a cross-variogram

Returns:

Instantiated GSTools Krige class.

Return type:

Krige

See also

gstools.Krige

plot(axes=None, grid=True, show=True, hist=True)[source]#

Variogram Plot

Plot the experimental variogram, the fitted theoretical function and an histogram for the lag classes. The axes attribute can be used to pass a list of AxesSubplots or a single instance to the plot function. Then these Subplots will be used. If only a single instance is passed, the hist attribute will be ignored as only the variogram will be plotted anyway.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters:
  • axes (list, tuple, array, AxesSubplot or None) – If None, the plot function will create a new matplotlib figure. Otherwise a single instance or a list of AxesSubplots can be passed to be used. If a single instance is passed, the hist attribute will be ignored.

  • grid (bool) – Defaults to True. If True a custom grid will be drawn through the lag class centers

  • show (bool) – Defaults to True. If True, the show method of the passed or created matplotlib Figure will be called before returning the Figure. This should be set to False, when used in a Notebook, as a returned Figure object will be plotted anyway.

  • hist (bool) – Defaults to True. If False, the creation of a histogram for the lag classes will be suppressed.

Return type:

matplotlib.Figure

scattergram(ax=None, show=True, **kwargs)[source]#

Scattergram plot

Groups the values by lags and plots the head and tail values of all point pairs within the groups against each other. This can be used to investigate the distribution of the value residuals.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters:
  • ax (matplotlib.Axes, plotly.graph_objects.Figure) – If None, a new plotting Figure will be created. If given, it has to be an instance of the used plotting backend, which will be used to plot on.

  • show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.

Returns:

fig – Resulting figure, depending on the plotting backend

Return type:

matplotlib.Figure, plotly.graph_objects.Figure

location_trend(axes=None, show=True, **kwargs)[source]#

Location Trend plot

Plots the values over each dimension of the coordinates in a scatter plot. This will visually show correlations between the values and any of the coordinate dimension. If there is a value dependence on the location, this would violate the intrinsic hypothesis. This is a weaker form of stationarity of second order.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters:
  • axes (list) – Can be None (default) or a list of matplotlib.AxesSubplots. If a list is passed, the location trend plots will be plotted on the given instances. Note that then length of the list has to match the dimeonsionality of the coordinates array. In case 3D coordinates are used, three subplots have to be given.

  • show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.

Keyword Arguments:

add_trend_line (bool) –

Added in version 0.3.5.

If set to True, the class will fit a linear model to each coordinate dimension and output the model along with a calculated R². With high R² values, you should consider rejecting the input data, or transforming it.

Note

Right now, this is only supported for 'plotly' backend

Returns:

fig – The figure produced by the function. Dependends on the current backend.

Return type:

matplotlib.Figure, plotly.graph_objects.Figure

distance_difference_plot(ax=None, plot_bins=True, show=True)[source]#

Raw distance plot

Plots all absolute value differences of all point pair combinations over their separating distance, without sorting them into a lag.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters:
  • ax (None, AxesSubplot) – If None, a new matplotlib.Figure will be created. In case a Figure was already created, pass the Subplot to use as ax argument.

  • plot_bins (bool) – If True (default) the bin edges will be included into the plot.

  • show (bool) – If True (default), the show method of the Figure will be called before returning the Figure. Can be set to False, to avoid doubled figure rendering in Jupyter notebooks.

Return type:

matplotlib.pyplot.Figure

__repr__()[source]#

Textual representation of this Variogram instance.

Returns:

__str__()[source]#

String Representation

Descriptive representation of this Variogram instance that shall give the main variogram parameters in a print statement.

Returns:

description – String description of the variogram instance. Described by the Variogram parameters.

Return type:

str