change_of_support¶
Change of support¶
Purpose¶
As sample volume can be very small compared the extraction volume, overestimation can be problematic considering a defined cutoff. Switching from sample to extraction volumes rarely follow the same distribution and can lead to costly decisional mistake. Moreover, as the variable of interest is rarely Gaussian, a Gaussian transformation is introduced. This transformation introduces several mathematical concepts to make it possible.
This notebook shows how Deeplime's geolime library can be used for that manner. It details a step-by-step procedure to achieve this transformation. Some theoretical background will be introduced for each step.
Dataset and descriptive statistics¶
The Walker Lake dataset available within geolime will be used throughout this notebook. It ensures comparative results, as it is widely used for geostatistics.
Descriptive statistics will be provided after data preprocessing. The dataset can first be loaded using the following Python commands, after the required libraries are properly loaded.
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import geolime as glm
walker_lake_path = "../../tests/data/walker_dataset/walker.csv"
data = pd.read_csv(walker_lake_path, sep= ' ')
data.loc[data['v'] > 1e10, 'v']=0
data.head()
id | x | y | u | v | t | |
---|---|---|---|---|---|---|
0 | 1 | 11 | 8 | 0.0 | 0.0 | 2 |
1 | 2 | 8 | 30 | 0.0 | 0.0 | 2 |
2 | 3 | 9 | 48 | 224.4 | 0.0 | 2 |
3 | 4 | 8 | 69 | 434.4 | 0.0 | 2 |
4 | 5 | 9 | 90 | 412.1 | 0.0 | 2 |
We notice that this 2D spatial dataset that contains two generic variables of interest ( u and v as concentration in ppm, a chemical or contaminate mass measurement per unit volume) and one indicator variable (t). The following command displays basic data statistics.
data.describe()
id | x | y | u | v | t | |
---|---|---|---|---|---|---|
count | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 |
mean | 235.500000 | 111.089362 | 141.221277 | 435.298723 | 353.451702 | 1.904255 |
std | 135.821574 | 70.894682 | 77.968954 | 299.882302 | 657.897488 | 0.294554 |
min | 1.000000 | 8.000000 | 8.000000 | 0.000000 | 0.000000 | 1.000000 |
25% | 118.250000 | 51.000000 | 80.000000 | 184.600000 | 0.000000 | 2.000000 |
50% | 235.500000 | 89.000000 | 139.500000 | 424.000000 | 25.300000 | 2.000000 |
75% | 352.750000 | 170.000000 | 208.000000 | 640.850000 | 428.625000 | 2.000000 |
max | 470.000000 | 251.000000 | 291.000000 | 1528.100000 | 5190.100000 | 2.000000 |
Data must be preprocessed as arrays prior for use with geolime.
coords = data[['x', 'y']].to_numpy()
values_u = data[['u']].to_numpy()
values_v = data[['v']].to_numpy()
Histograms can then be plotted so as to indicate the distribution of the u and v variables of interest.
plt.hist(values_u, bins=20,label='Distribution of u')
plt.legend();
<matplotlib.legend.Legend at 0x7f4838943a00>
plt.hist(values_v, bins=20,label='Distribution of v')
plt.legend();
<matplotlib.legend.Legend at 0x7f4838897df0>
Considering the shape of the u variable distribution, which seems irregularly distributed, it will be analyzed further in the following sections. We can test normality using a Shapiro-Wilk test.
shapiro_test = stats.shapiro(values_u)
print("Shapiro-Wilk test p-value:", shapiro_test.pvalue)
Shapiro-Wilk test p-value: 1.6578262096444973e-09
The p-value is extremely significant, we can thus reject the null-hypothesis that the u variable follows a null-distribution. Change of support then appears well suited and will be considered in the following sections. We first need a to compute the associated Gauss scores.
Anamorphosis¶
Gauss scores¶
Gauss scores can simply be computed the following way.
gauss_u = glm.gauss_score(values_u.flatten())
gauss_u = np.array([gauss_u])
plt.hist(gauss_u[0], bins=20,label='Distribution of gaussian scores')
plt.legend();
<matplotlib.legend.Legend at 0x7f48387b8a00>
A Shapiro-Wilk test on the gauss scores consequently leads to a non-significative p-value that forbids null-hypothesis rejection.
shapiro_test = stats.shapiro(gauss_u)
print("Shapiro-Wilk test p-value:", shapiro_test.pvalue)
Shapiro-Wilk test p-value: 0.999985933303833
Gauss scores can be plotted with the following commands.
fig, ax = plt.subplots()
ax.scatter(gauss_u, values_u, color ='b' , s = 0.2 )
ax.set_title("Gauss scores vs observed values")
plt.xlabel('Gauss scores')
plt.ylabel('Values')
plt.show()
Gauss scores will be used as input for variogram modelling in the following sections.
Hermite polynomials¶
Hermite orthogonal polynomials are very conveniant tool for approximating a Gaussian random variable. In terms of computation, they are widely used in order to avoid computing a cumulative distribution function through an integral. geolime provides a set of API calls to expand a variable of interest (Gaussian, for instance after having transformed a non-gaussian variable with an anamorphosis through Gauss scores, as seen previously). Hermite polynomials expansion can be retrieved with the following calls.
anam = glm.Anamorphosis(data=values_u.flatten())
anam_theo_gauss = anam.theoretical_transform(gauss_u, 500)
The Hermite polynomial expansion can be evaluated on the sample values for an order of 500.
x = np.linspace(-3, 3, 100)
fig, ax = plt.subplots()
ax.plot(x, anam.theoretical_transform(x, 500), color ='b')
ax.set_title("Hermite expansion evaluation")
plt.show()
This anamorphosis object that handles Hermite polynomials will be used later on for estimation.
Variography¶
The data are now gaussian and a change of support can be computed, and needs 3 parameters :
- Hermite coefficients of the anamorphosis transform
- Variogram model
- A "support", which is the size of a regular block unit
Variogram model and support are the same as the previous notebooks (Variography and Kriging). Variogram Model is extracted from an autofit process.
target_grid = glm.create_grid_2d(origin=[0, 0], cell_size=[10, 10], n_dim=[26, 30], angle=[0, 0, 0])
pattern_smu = glm.GridPatternGenerator(target_grid, [5, 5])
cov_model = 29535 * glm.Nugget() + 31156 * glm.Spherical(2,angles=339,scales=[30,37]) + 35487 * glm.Spherical(2,angles=160,scales=[30,90])
We now have all the required variables to compute the change of support coefficient.
Change of support¶
As briefly stated in the overview section, the change of support (support can refer to sample size) allows us to apply a consistent transformation when considering samples with high-variability (typically based on drill hole cores). This variability can be problematic when estimating larger surfaces or volumes of an interest variable. This technique helps harmonizing the estimates on blocks, for instance, and a simple scalar coefficient provides a seducing theoretical approach to smoothly switch from small samples to larger volumes. This change of support coefficient is a must for applying discrete gaussian kriging or conditional expectation when estimating spatial variables that can follow ant type of distribution. These estimation techniques are more conseuently covered in adjacent notebooks.
The previously required steps can be used to compute this coefficient. It computation requires the raw fitted covariance model on the sample values.
anam_coeff_points = anam.hermite_coefficients(500)
r, anam_coeff_blocks = glm.change_of_support(anam_coeff_points, cov_model, pattern_smu)
print("Change of support coefficient:", r)
Change of support coefficient: 0.7777777777777778
Support effect can be understood and visualized by plotting the punctual and the block distribution of the samples.
plt.figure(figsize=(20, 5))
plt.subplot(1,2,1)
plt.hist(values_u,bins=20,label='Punctual')
plt.xlabel('Concentration (ppm)')
plt.ylabel('Frequency')
plt.legend();
plt.subplot(1,2,2)
plt.hist(glm.evaluate_hermite_expansion(anam.gauss_scores, anam_coeff_blocks),bins=20,label='Block')
plt.xlabel('Concentration (ppm)')
plt.ylabel('Frequency')
plt.legend();
<matplotlib.legend.Legend at 0x7f4838544dc0>
The following code snippet compute the change of support coefficient for several block size between 1 m (almost punctual) and 50 m.
discr = np.arange(start=1, stop=50, step=1)
rl = []
for i,v in enumerate(discr):
target_grid_i = glm.create_grid_2d(origin=[0, 0], n_dim=[26,30], cell_size=[v, v], angle=[0.0, 0.0, 0.0])
pattern_i = glm.GridPatternGenerator(target_grid_i, [5, 5])
r, ablock = glm.change_of_support(anam_coeff_points, cov_model, pattern_i)
rl.append(r)
size = range(1,50,1)
fig, ax = plt.subplots()
ax.scatter(size, rl, color ='b')
ax.set_title("Effect of cell size on change of support coefficient")
plt.xlabel('Cell size in meters')
plt.ylabel('Change of support coefficient')
plt.show()
Change of support coefficient decreases as the cell size of the grid increases.
Estimation¶
Estimation is ultimately the cornerstone purpose of change of support. Practically, an operator will be interested into knowing the metal estimation in a block, depending on an fixed cutoff value. Uniform conditioning ow comes handy for that purpose, as it is a nonlinear estimation technique that estimates the conditional distribution of metal and tonnage above cutoff within a mining panel.
Block anamorphosis is needed prior estimating and can be computed seamlessly by combining the Hermite coefficients seen previously with the change of support coefficient.
cut_offs = np.linspace(np.min(values_u), np.max(values_u), 100)
a = np.array([glm.ore_metal(zcut, anam_coeff_points) for zcut in cut_offs])
b = np.array([glm.ore_metal(zcut, anam_coeff_blocks) for zcut in cut_offs])
plt.figure(figsize=(20, 5))
plt.subplot(1,2,1)
plt.plot(cut_offs, a[:, 0], color='r', label="Punctual Proportion")
plt.plot(cut_offs, b[:, 0], color='b', label="Block Proportion")
plt.legend(loc='best')
plt.xlabel("Cut Off")
plt.ylabel("Ore Proportion")
plt.subplot(1,2,2)
plt.plot(cut_offs, a[:, 1], color='r', label="Punctual Metal")
plt.plot(cut_offs, b[:, 1], color='b', label="Block Metal")
plt.legend(loc='best')
plt.xlabel("Cut Off")
plt.ylabel("Metal content")
plt.show()
Left plot shows proportion of samples higher than a defined cut-off grade.
Right plot shows recoverable metal content for a defined cut-off grade. Support effect clearly shows that estimating on point samples leads to over estimation of the deposit.
For instance, for a cut-off of 800 ppm, punctual estimation leads to a metal content of 120 g per volume unit whereas block estimation leads to a metal content of 75 g per volume unit.