Iron Population Clustering#

In data science, a domain defines the type of data value that is held by an attribute in a data model. For the Rocklea dome case we want to define domains in which samples would have a similar iron concentration. It can be two different domains for instance, one with a high iron concentration and the other one with a low iron concentration.

Different types of domain#

Different domains are defined by the way they are created. hose created through classical clustering show poor spatial contiguity compared to ones obtained by geostatistical clustering. Classical clustering does not include spatial information and tends to produce spatially scattered groups of samples which is a problem in mining exploitation whereas geostatistical clustering allows to have spacially contiguous clusters. This method allows to have better domains in mining exploration but is more complicated to use. An alternative approach consists in using classical clustering methods on spacially contiguous data set.

To visualize the different domains we can represent them through histograms.

Histograms#

A histogram is an approximate representation of the distribution of numerical data. To construct a histogram the first step is to divide the entire range of values into a series of intervals and then count how many values fall into each interval. The vertical y-axis represents the number count or percentage of occurences in the data for each column.

First we need to import the useful libraries and csv files.

import geolime as geo
from pyproj import CRS
import numpy as np
from sklearn.cluster import KMeans
import pyvista as pv

pv.set_jupyter_backend('panel')


geo.Project().set_crs(CRS("EPSG:20350"))
/tmp/ipykernel_5083/829011732.py:7: PyVistaDeprecationWarning: `panel` backend is deprecated and is planned for future removal.
  pv.set_jupyter_backend('panel')
dh = geo.read_file("../data/domained_drillholes.geo")
dh.user_properties()
['X_COLLAR',
 'Y_COLLAR',
 'Z_COLLAR',
 'X_M',
 'Y_M',
 'Z_M',
 'X_B',
 'Y_B',
 'Z_B',
 'X_E',
 'Y_E',
 'Z_E',
 'Fe_pct',
 'Al2O3',
 'SiO2_pct',
 'K2O_pct',
 'CaO_pct',
 'MgO_pct',
 'TiO2_pct',
 'P_pct',
 'S_pct',
 'Mn_pct',
 'Fe_ox_ai',
 'hem_over_goe',
 'kaolin_abundance',
 'kaolin_composition',
 'wmAlsmai',
 'wmAlsmci',
 'carbai3pfit',
 'carbci3pfit',
 'Sample_ID',
 'Fe',
 'Fe2o3',
 'P',
 'S',
 'SiO2',
 'MnO',
 'Mn',
 'CaO',
 'K2O',
 'MgO',
 'Na2O',
 'TiO2',
 'LOI_100',
 'Depth',
 'ellipsoidal_distance',
 'OreZone',
 'domain_code',
 'domain']

To determine the number of domains that we need we can look at the geology of the Rocklea dome area. The biggest part of the iron deposits have been made during the same era and form a layer of alternating shales. As this formation is the only one exploited for iron mining we can create two domains, one rich in iron and one poor in iron where the limit will be this iron rich formation. There is also a paleochannel very rich in iron but it takes its source in the precedent rich iron deposit so we can assume they both belongs to the same domain.

To create these two domains we can have a first look at the ferric distribution in the area.

geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct"}], nbins=20)

This histogram shows a bivariate distribution of the iron concentration in the area. The first peak corresponds to a low iron concentration and the second one to a high iron concentration. The separation between these two domains seems to be around 30 %. If we look at the two domains created we have :

geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "Fe_pct <= 30"},
        {"object": dh, "property": "Fe_pct", "region": "Fe_pct >= 30"}
    ],
    nbins=30
)
dh.data("Fe_pct", region="Fe_pct >= 30")
array([37., 48., 40., ..., 44., 43., 39.])

As we can see we have two separate domains but we want to make a more accurate separation. For this we can use the chemical and mineralogical correlation study. We will consider four correlations : the positives correlations with the ferric oxides abundance and the kaolin abundance and the negative correlations with the silica and alumina.

Correlation between “Fe %” and “SiO2 %”#

This is the highest negative correlation, -0.9. A high concentration in silica implies a low concentration in iron. We can separate the domains according to the silica concentration but first we need to find the most accurate silica treshold where we will clearly have a domain with a high silica concentration and one with a low silica concentration.

geo.histogram_plot(data=[{"object": dh, "property": "SiO2_pct"}], nbins=20)

We see a big landing around 30. We can try different separation treshold to find the more accurate. To study the accuracy we can calculate the variance of the list of values. The smallest the variance is the better is the separation of the domains.

dh.data("Fe_pct", region="SiO2_pct < 30").var()
137.452806876235
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"SiO2_pct < 30"}], nbins=30)
dh.data("Fe_pct", region="SiO2_pct < 25").var()
113.94029787165452
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"SiO2_pct < 25"}], nbins=30)

We can choose 25 % for the silica treshold. Now let’s look at the complementary domain.

dh.data("Fe_pct", region="SiO2_pct > 25").var()
106.76456358233848
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"SiO2_pct > 25"}], nbins=10)

We have two univariate domains.

geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "SiO2_pct < 25"},
        {"object": dh, "property": "Fe_pct", "region": "SiO2_pct > 25"}
    ],
    nbins=10
)

Finally, the precedent histogram shows the two different domains created with the SiO2 % parameters. We can do the same with the three other chemical and mineralogical data.

Correlation between “Fe %” and “Fe ox ai”#

The correlaation coefficient between these two elements is 0.8. A high abundance in ferric oxides implies a high concentration in iron.

geo.histogram_plot(data=[{"object": dh, "property": "Fe_ox_ai"}], nbins=10)

We can put separation around 0.20.

np.nanvar(dh.data("Fe_pct", region="Fe_ox_ai < 0.2"))
124.96694706994329
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"Fe_ox_ai < 0.2"}], nbins=10)
np.nanvar(dh.data("Fe_pct", region="Fe_ox_ai > 0.2"))
153.0144338532727
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"Fe_ox_ai > 0.2"}], nbins=10)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "Fe_ox_ai > 0.2"},
        {"object": dh, "property": "Fe_pct", "region": "Fe_ox_ai < 0.2"}
    ],
    nbins=10
)

Correlation between “Fe %” and “kaolin abundance”#

The correlation coefficient between these two elements is -0.8. A high abundance in kaolin implies a low concentration in iron.

geo.histogram_plot(data=[{"object": dh, "property": "kaolin_abundance"}], nbins=20)

We can look for a separation around 0.10.

np.nanvar(dh.data("Fe_pct", region="kaolin_abundance > 0.1"))
109.82931205247972
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"kaolin_abundance > 0.1"}], nbins=10)
np.nanvar(dh.data("Fe_pct", region="kaolin_abundance < 0.1"))
126.38087246165082
geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct", "region":"kaolin_abundance < 0.1"}], nbins=10)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance > 0.1"},
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance < 0.1"}
    ],
    nbins=10
)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance > 0.12"},
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance < 0.12"}
    ],
    nbins=10
)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance > 0.09"},
        {"object": dh, "property": "Fe_pct", "region": "kaolin_abundance < 0.09"}
    ],
    nbins=10
)

Correlation between “Fe %” and “Al2O3”#

The correlation coefficient between these two elements is -0.7.

geo.histogram_plot(data=[{"object": dh, "property": "Al2O3"}], nbins=20)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 > 8"},
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 < 8"}
    ],
    nbins=20
)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 > 10"},
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 < 10"}
    ],
    nbins=10
)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 > 14"},
        {"object": dh, "property": "Fe_pct", "region": "Al2O3 < 14"}
    ],
    nbins=10
)

Now that we studied all the correlations we can use them together to create more accurate domains.

geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "(SiO2_pct < 25) & (Fe_ox_ai > 0.2)"},
        {"object": dh, "property": "Fe_pct", "region": "(SiO2_pct > 25) & (Fe_ox_ai < 0.2)"}
    ],
    nbins=10
)

Silica and ferric oxide abundance already gave two great domains. We can also try to add the two other correlations.

geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "(SiO2_pct < 25) & (Fe_ox_ai > 0.2) & (kaolin_abundance < 0.09) & (Al2O3 < 14)"},
        {"object": dh, "property": "Fe_pct", "region": "(SiO2_pct > 25) & (Fe_ox_ai < 0.2) & (kaolin_abundance > 0.09) & (Al2O3 > 14)"}
    ],
    nbins=10
)

We can look at the variance of the different domains for the first and the last histogram to see if we have something more accurate thanks to the correlations.

For the domains created with the iron percentage we have :

dh.data("Fe_pct", region="Fe_pct < 30").var()
61.206917557444655
dh.data("Fe_pct", region="Fe_pct > 30").var()
51.02927302532986

And for those created with the correlation relation we have :

dh.data("Fe_pct", region="(SiO2_pct < 25) & (Fe_ox_ai > 0.2) & (kaolin_abundance < 0.09) & (Al2O3 < 14)").var()
22.618033424366082
dh.data("Fe_pct", region="(SiO2_pct > 25) & (Fe_ox_ai < 0.2) & (kaolin_abundance > 0.09) & (Al2O3 > 14)").var()
42.12592455798366
dh.set_property_expr(name="manual_classif_code", expr="sqrt(-1)")
dh.set_property_expr(name="manual_classif_code", expr="1", region="((SiO2_pct < 28) & (Fe_ox_ai > 0.15) & (Fe_pct > 25) & (Al2O3 < 14))")
dh.set_property_expr(name="manual_classif_code", expr="0", region="((SiO2_pct >= 28) | (Fe_ox_ai <= 0.15) | (Fe_pct <= 25) | (Al2O3 >= 14))")
dh.set_property_expr(name="manual_classif", expr="sqrt(-1)")
dh.set_property_expr(name="manual_classif", expr="'LowGrade'", region="manual_classif_code == 0")
dh.set_property_expr(name="manual_classif", expr="'HighGrade'", region="manual_classif_code == 1")
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "manual_classif == 'LowGrade'"},
        {"object": dh, "property": "Fe_pct", "region": "manual_classif == 'HighGrade'"}
    ],
    nbins=10
)

We can notice that for both domains the variance is lower with the correlation separation, which means that these are more accurate and better for mining exploitation.

What about automatic domaining ?#

X = dh.data(["Fe_ox_ai", "SiO2_pct", "Al2O3", "Fe_pct"], "Fe_ox_ai + SiO2_pct + Al2O3 + Fe_pct > 0")
kmeans = KMeans(n_clusters=2).fit(X)
dh.set_property_expr(name="sk_classif_code", expr="sqrt(-1)")
dh.set_property_value(name="sk_classif_code", data=kmeans.labels_, region="Fe_ox_ai + SiO2_pct + Al2O3 + Fe_pct > 0")
dh.set_property_expr(name="sk_classif", expr="sqrt(-1)")
dh.set_property_expr(name="sk_classif", expr="'LowGrade'", region="sk_classif_code == 0")
dh.set_property_expr(name="sk_classif", expr="'HighGrade'", region="sk_classif_code == 1")
np.unique(kmeans.labels_)
array([0, 1], dtype=int32)
geo.histogram_plot(
    data=[
        {"object": dh, "property": "Fe_pct", "region": "sk_classif == 'LowGrade'"},
        {"object": dh, "property": "Fe_pct", "region": "sk_classif == 'HighGrade'"}
    ],
    nbins=10
)
dh.data("Fe_pct", region="sk_classif == 'LowGrade'").var()
81.35939293508208
dh.data("Fe_pct", region="sk_classif == 'HighGrade'").var()
61.011119253014414
dh.to_file("../data/dh_pop_classif")

Comparison with Geometric Modelling#

gs = geo.read_file("../data/surfaces.geo")
gs_pv_low = gs.to_pyvista('hard_top_Lower')
gs_pv_up = gs.to_pyvista('hard_top_Upper')
gs_pv_ore = gs.to_pyvista('hard_top_OreZone')
dh_pv = dh.to_pyvista('domain_code')
domain_solid = geo.datasets.load("rocklea_dome/domain_mesh.dxf")
domain_solid_pv = domain_solid.to_pyvista()
p = pv.Plotter()

p.add_mesh(gs_pv_low, color='r')
p.add_mesh(domain_solid_pv, color='y')

p.add_mesh(dh_pv.tube(radius=20))

p.set_scale(zscale=20)

p.show()