kriging¶

kriging

Kriging¶

Purpose¶

This notebook describes step-by-step guidelines used for applying kriging algorithm using Deeplime's geolime geostatistics library. A widespread spatial dataset is used as en entrypoint for applying the geolime API for kriging. This datased will first be statistically described prior to apply ordinary and simple kriging (punctual or by block). The outputs will then be displayed numerically and graphically for interpretation.

This notebook mostly details how to achieve kriging outputs and visualization with geolime, and doesn't not emphasis on the theoretical fundamentals of kriging formulas and algorithm, although basic knowledge of these estimation techniques are given as insights throughout this notebook.

For starters, on needs to prepare this Python notebook and import geolime and the necessary libraries used for data manipulation.

In [1]:

%matplotlib inline

In [2]:

# loading packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

The geolime library can be imported as such:

In [3]:

import geolime as glm

Dataset¶

The Walker Lake dataset openly available for educational purposes will be used in this notebook. Several seminal geostatistics papers and acedemics use this widespread dataset for applying and analyzing geostatistics concepts, algorithm and models. This choice ensures that Deeplime's products and services through geolime can be compared to existing scientific papers or tools. This dataset is available as a CSV file within geolime and can be loaded with Python using the following commands.

In [4]:

walker_dataset_path = "../../tests/data/walker_dataset/walker.csv"
# load input dataset
data = pd.read_csv(walker_dataset_path, sep=' ')
data.loc[data['v']>1e10,'v']=0

We notice that this 2D spatial dataset that contains two generic variables of interest ( u and v as concentration in ppm, a chemical or contaminate mass measurement per unit volume) and one indicator variable (t).

In [5]:

data.head()

Out[5]:

	id	x	y	u	t
0	1	11	8	0.0	2
1	2	8	30	0.0	2
2	3	9	48	224.4	2
3	4	8	69	434.4	2
4	5	9	90	412.1	2

The following command displays basic data statistics, that will be used later on specifically for simple kriging.

In [6]:

data.describe()

Out[6]:

	id	x	y	u	v	t
count	470.000000	470.000000	470.000000	470.000000	470.000000	470.000000
mean	235.500000	111.089362	141.221277	435.298723	353.451702	1.904255
std	135.821574	70.894682	77.968954	299.882302	657.897488	0.294554
min	1.000000	8.000000	8.000000	0.000000	0.000000	1.000000
25%	118.250000	51.000000	80.000000	184.600000	0.000000	2.000000
50%	235.500000	89.000000	139.500000	424.000000	25.300000	2.000000
75%	352.750000	170.000000	208.000000	640.850000	428.625000	2.000000
max	470.000000	251.000000	291.000000	1528.100000	5190.100000	2.000000

Coordinates and variables of interest must first be converted to numerical arrays before applying kriging API functions.

In [7]:

coords = data[['x', 'y']].to_numpy()
values_u = data[['u']].to_numpy()
values_v = data[['v']].to_numpy()

The u and v variables of interested can be spatially displayed with Python. We can notice a non-uniform spread in the selected variable of interest. The kriging methods explained later will be be used as visual appraisal for spatial estimation or interpolation.

In [8]:

plt.scatter(coords[:,0], coords[:,1], c=values_u, cmap='viridis')

Out[8]:

<matplotlib.collections.PathCollection at 0x7f2ebc71b220>

In [9]:

plt.scatter(coords[:,0], coords[:,1], c=values_v, cmap='viridis')

Out[9]:

<matplotlib.collections.PathCollection at 0x7f2ebc603c40>

Histograms can also be displayed for further distribution information.

In [10]:

plt.hist(values_u)

Out[10]:

(array([99., 77., 78., 83., 64., 45., 15.,  5.,  1.,  3.]),
 array([   0.  ,  152.81,  305.62,  458.43,  611.24,  764.05,  916.86,
        1069.67, 1222.48, 1375.29, 1528.1 ]),
 <BarContainer object of 10 artists>)

In [11]:

plt.hist(values_v)

Out[11]:

(array([368.,  48.,  26.,  15.,   6.,   2.,   1.,   3.,   0.,   1.]),
 array([   0.  ,  519.01, 1038.02, 1557.03, 2076.04, 2595.05, 3114.06,
        3633.07, 4152.08, 4671.09, 5190.1 ]),
 <BarContainer object of 10 artists>)

Variography¶

Prior to applying kriging, we first need to provide a spatial correlation metric and models called variogram. These are covered more in-depth in a separate notebook. But the semivariogram computed with geolime can be modelled for kriging using the following commands. We choose to autofit a Spherical covariance model, and it can be computed and displayed along with the experimental semivarogram with the following commands.

In [12]:

lags, tol = glm.generate_lags(lag=10, plag=50, nlags=10)
# compute variogram
vario_exp_u_omni = glm.variogram(coords=coords, values=values_u, lags=lags, tol=tol, geographic_azimuth=0, dip=0, pitch=0, atol=90)
cov = glm.Nugget() + 0.5 * glm.Spherical(2)
cov_model = glm.model_fit(variograms=[vario_exp_u_omni], cov=cov, constraints=None)
glm.plot_semivariogram(variograms=[vario_exp_u_omni], model=cov_model)

/home/weaklink/.virtualenvs/geolime/lib/python3.8/site-packages/lmfit/minimizer.py:825: RuntimeWarning: invalid value encountered in sqrt
  par.stderr = sqrt(self.result.covar[ivar, ivar])
/home/weaklink/.virtualenvs/geolime/lib/python3.8/site-packages/lmfit/minimizer.py:832: RuntimeWarning: invalid value encountered in sqrt
  (par.stderr * sqrt(self.result.covar[jvar, jvar])))

Out[12]:

(<Figure size 432x288 with 1 Axes>,
 <AxesSubplot:xlabel='Average distance', ylabel='Semivariance'>)

The choice of a Spherical covariance model appears to be a good default choice for the following estimation steps.

Simple kriging¶

Simple kriging is the starting point for spatial data estimation. Kriging is the best linear estimator for interpolation. But the mean used for kriging weights computation, which are used to estimate the unknown points on a target frid, is considered to be known prior to applying the kriging algorithm. It is then given as an input argument for the main kriging API function. This function takes several input arguments, and more particularly a target grid that defines the estimation resolution. It also requires the covariance model computed beneath and a neighborhood. This function outputs estimation values for the input grid and standard deviation for each grid points. Below, a 2D example on the u variable of interest.

Ths following API calls define the target grid that will be used for all the remaining kriging algorithm throughout this notebook. It is also used to generate a pattern variable that requires a discretization parameter that will be explained in the next section when presenting block kriging.

In [13]:

# create 3d grid
target_grid = glm.create_grid_2d(origin=[0, 0], cell_size=[10, 10], n_dim=[26,30], angle=[0, 0, 0])
discretization = [1, 1]
pattern = glm.GridPatternGenerator(target_grid, discretization)

The unique neighborhood used as input argument can be retrieved using the following API call.

In [14]:

neigh = glm.Neighborhood()

Kriging is effectively applied with the following call, and provides numerical arrays that can be converted to a dataframe for a more user-friendly management data structure.

In [15]:

result_sk = glm.block_kriging(coords, values_u, target_grid.get_center(), cov, neigh, pattern, mean=np.mean(values_u))
result_sk_df = pd.DataFrame(data=result_sk, columns=['estim', 'stdev'])

Here we display the first rows of the estimated and standard deviation for each point for the associated target grid defined beneath.

In [16]:

result_sk_df.head()

Out[16]:

	estim	stdev
0	226.207431	176.373001
1	136.360991	158.127624
2	168.077243	170.273039
3	274.942748	152.214909
4	459.397363	138.197635

Esimated points can be displayed as a 2D plot of the interpolated values with the following API call.

In [17]:

glm.display_grid(target_grid, result_sk_df['estim'].to_numpy(), coords, z=None, title="Simple kriging")

Out[17]:

(<Figure size 432x288 with 2 Axes>,
 <AxesSubplot:title={'center':'Simple kriging'}>)

We can visually observe the grid resolution and the estimated points with this plot. The spatial gaps are consistently filled throughout the grid and can be related to the variable dataset displayed in the first notebook section. The colour bar indicates the estimated point values.

Simple kriging can also be computed with a custom neighborhood (or sliding neighborhood). Doing so restrains the point search to a user-defined area. It can be defined using the following class.

In [18]:

custom_neigh = glm.CustomNeigh(dim=2, angles=[0, 0, 0], scales=[50, 50], n=100)
result_sk = glm.block_kriging(coords, values_u, target_grid.get_center(), cov, custom_neigh, pattern, mean=np.mean(values_u))
result_sk_df = pd.DataFrame(data=result_sk, columns=['estim', 'stdev'])
glm.display_grid(target_grid, result_sk_df['estim'].to_numpy(), coords, z=None, title="Simple kriging with sliding neighborhood")

<ipython-input-1-63c341fdce3d>:1: DeprecatedWarning: CustomNeigh is deprecated as of 0.2 and will be removed in 0.3. Use the MinMaxPointsNeighborhood class instead
  custom_neigh = glm.CustomNeigh(dim=2, angles=[0, 0, 0], scales=[50, 50], n=100)

Out[18]:

(<Figure size 432x288 with 2 Axes>,
 <AxesSubplot:title={'center':'Simple kriging with sliding neighborhood'}>)

The neighborhood have deliberately been underfitted as to show the effect of a restrained search. This can be effectively be seen on the former plot.

Ordinary kriging¶

Ordinary kriging provides a more general approach compared to simple kriging because no assumptions are provided for the variable of interest mean. More specifically, kriging is also applied to estimate the mean of the variable of interest to be estimated. Ordinary kriging applies to kriging steps, one on the variable's mean and another on the spatial grid using the former result in the same fashion as seen in the simple kriging section. This section shows how to apply punctual and block kriging.

Punctual kriging¶

Using the previously defined target grid, and still considering the same unique neighborhood as input argument and covariance model fitted in the variography section, ordinary kriging is applied on the entire target grid with the same function as seen in the simple kriging section, minus the optional argument. If mean is not supplied during the API call, ordinry kriging will be applied as the default kriging method.

The pattern argument previously defined allows the user to define the block size. This list has been set to be defaulted for punctual kriging, meaning that we don't subdivide the grid for the moment. Ordinary kriging outputs follows the same coding practice seen before, and the output stays similar to simple kriging output.

In [19]:

result_ok = glm.block_kriging(coords, values_u, target_grid.get_center(), cov_model, neigh, pattern)
result_ok_df = pd.DataFrame(data=result_ok, columns=['estim', 'stdev'])
result_ok_df.head()

Out[19]:

	estim	stdev
0	149.226151	163.822666
1	82.188147	143.606830
2	115.612216	156.858772
3	230.391517	136.939538
4	422.235622	121.074634

As average kriging is computed for mean estimation we notice a slight graphic difference in the visual output. But the display function used previously still displays a grid resolutionlarity for the estimated points seen beneath.

In [20]:

glm.display_grid(target_grid, result_ok_df['estim'].to_numpy(), coords, z=None, title="Ordinary kriging")

Out[20]:

(<Figure size 432x288 with 2 Axes>,
 <AxesSubplot:title={'center':'Ordinary kriging'}>)

Block kriging¶

Block kriging still uses the same versatile kriging function, but is now defined through a different discretization parameter. If we consider 10 meters square blocks, block kriging can effectively be computed by modyfing the input pattern. The pattern will define how many points are estimated in a block (here 5x5). Considering 10x10 blocks, the pattern will average 25 estimates in each block. The following chunk of code will help to understand how to to define the block size to be applied on the target grid.

In [21]:

pattern_bk = glm.GridPatternGenerator(target_grid, [5, 5])
result_bk = glm.block_kriging(coords, values_u, target_grid.get_corner(), cov_model, neigh, pattern_bk)
result_bk_df = pd.DataFrame(data=result_bk, columns=['estim', 'stdev'])

In [22]:

result_bk_df.head()

Out[22]:

	estim	stdev
0	212.084107	198.772160
1	153.659345	174.801128
2	133.789663	180.494429
3	204.268882	178.490390
4	348.108971	170.147650

In [23]:

glm.display_grid(target_grid, result_bk_df['estim'].to_numpy(), coords, z=None, title="Block kriging")

Out[23]:

(<Figure size 432x288 with 2 Axes>,
 <AxesSubplot:title={'center':'Block kriging'}>)

Afterword¶

This notebook has shown how to properly handle geolime for simple and ordinary kriging (that can both be used in a punctual or block approach).

Kriging requires preliminary requirements, such as a input covariance model, a custon or unique neighborhood and a target grid
Outputs can be checked visually by displaying the target grid

Discrete gaussian kriging can also be implemented using geolime, but introduces geostatistics concepts such as change of support that are detailed in another notebook.