Utility Functions#
Shannon Entropy#
- skgstat.util.shannon.shannon_entropy(x, bins)[source]#
Shannon Entropy
Calculates the Shannon Entropy, which is the most basic metric in information theory. It can be used to calculate the information content of discrete distributions. This can be used to estimate the intrinsic uncertainty of a sample,independent of the value range or variance, which makes it more comparable.
- Parameters:
x (numpy.ndarray) – flat 1D array of the observations
bins (list, int) – upper edges of the bins used to calculate the histogram of x.
- Returns:
h – Shannon Entropy of x, given bins.
- Return type:
Cross Validation#
- skgstat.util.cross_validation.jacknife(variogram, n: int = None, metric: str = 'rmse', seed=None) float [source]#
Leave-one-out cross validation of the given variogram model using the OrdinaryKriging instance. This method can be called using
Variogram.cross_validate
.- Parameters:
variogram (skgstat.Variogram) – The variogram isnstance to be validated
n (int) – Number of points that should be used for cross validation. If None is given, all points are used (default).
metric (str) – Metric used for cross validation. Can be one of [‘rmse’, ‘mse’, ‘mae’]
- Returns:
metric – Cross-validation result The value is given in the selected metric.
- Return type:
Uncertainty Propagation#
- skgstat.util.uncertainty.propagate(variogram: Variogram = None, source: str | List[str] = 'values', sigma: float | List[float] = 5, evalf: str | List[str] = 'experimental', verbose: bool = False, use_bounds: bool = False, **kwargs)[source]#
Uncertainty propagation for the variogram. For a given
Variogram
instance a source of error and scale of error distribution can be specified. The function will propagate the uncertainty into different parts of theVariogram
and return the confidence intervals or error bounds.- Parameters:
variogram (skgstat.Variogram) – The base variogram. The variogram parameters will be used as fixed arguments for the Monte Carlo simulation.
source (list) – Source of uncertainty. This has to be an attribute of
Variogram
. Right now only'values'
is really supported, anything else is untested.sigma (list) – Standard deviation of the error distribution.
evalf (list) – Evaluation function. This specifies, which part of the
Variogram
should be used to be evaluated. Possible values are'experimental'
for the experimental variogram,'model'
for the fitted model andparameter'
for the variogram parametersverbose (bool) – If True, the uncertainty_framework package used under the hood will print a progress bar to the console. Defaults to False.
use_bounds (bool) – Shortcut to set the confidence interval bounds to the minimum and maximum value and thus return the error margins over a confidence interval.
- Keyword Arguments:
distribution (str) – Any valid
numpy.random
distribution function, that takes the scale as argument. Defaults to'normal'
.q (int) – Width (percentile) of the confidence interval. Has to be a number between 0 and 100. 0 will result in the minimum and maximum value as bounds. 100 turns both bounds into the median value. Defaults to
10
num_iter (int) – Number of iterations used in the Monte Carlo simulation. Defaults to
500
.eval_at (int) – If evalf is set to model, the theoretical model get evaluated at this many evenly spaced lags up to maximum lag. Defaults to
100
.n_jobs (int) –
The evaluation can be performed in parallel. This will specify how many processes may be spawned in parallel. None will spwan only one (default).
Note
This is an untested experimental feature.
- Returns:
conf_interval – Confidence interval of the uncertainty propagation as [lower, median, upper]. If more than one evalf is given, a list of ndarrays will be returned. See notes for more details.
- Return type:
Notes
For each member of the evaluated property, the lower and upper bound along with the median value is returned as
[low, median, up]
. Thus the returned array has the shape(N, 3)
. N is the length of evaluated property, which isn_lags <skgstat.Variogram.n_lags()
for'experimental'
, either3
for'parameter'
or4
ifVariogram.model = 'stable' | 'matern'
and100
for'model'
as the model gets evaluated at 100 evenly spaced lags up to the maximum lag class. This amount can be changed using the eval_at parameter.If more than one evalf parameter is given, the Variogram will be evaluated at multiple steps and each one will be returned as a confidence interval. Thus if
len(evalf) == 2
, a list containing two confidence interval matrices will be returned. The order is [experimental, parameter, model].