# ASSETS – Model analysis and further use¶

In this second half of the paper we will explore different features that
complement and expand the construction of the geological model itself.
These extensions are just some examples of how *GemPy* can be used as
geological modeling engine for diverse research projects. The numerous
libraries in the open-source ecosystem allow to choose the best narrow
purpose tool for very specific tasks. Considering the visualization of
*GemPy*, for instance: *matplotlib*
[C1] for 2-D visualization, *vtk*
for fast and interactive 3-D visualization, *steno3D* for sharing block
models visualizations online—or even the open-source 3-D modeling
software Blender [C2] for creating high quality
renderings and Virtual Reality, are only some examples of the
flexibility that the combination of *GemPy* with other open-source
packages offers. In the same fashion we can use the geological model as
basis for the subsequent geophysical simulations and process
simulations. Due to Python’s modularity, combining distinct modules to
extend the scope of a project to include the geological modeling process
into a specific environment is effortless. In the next sections we will
dive into some of the built-in functionality implemented to date on top
of the geological modeling core. Current assets are: (i) 2-D and 3-D
visualizations, (ii) forward calculation of gravity, (iii) topology
analysis, (iv) uncertainty quantification (UQ) as well as (v) full
Bayesian inference.

## Visualization¶

The segmentation of meaningful units is the central task of geological modelling. It is often a prerequisite for engineering projects or process simulations. An intuitive 3-D visualization of a geological model is therefore a fundamntal requirement.

For its data and model visualization, *GemPy* makes use of freely
available tools in the Python module ecosystem to allow the user to
inspect data and modeling results from all possible angles. The
fundamental plotting library *matplotlib*
[C1], enhanced by the statistical
data visualization library *seaborn*
[C3], provides the 2-D
graphical interface to visualize input data and 2-D sections of scalar
fields and geological models. In addition, making use of the capacities
of *pyqt* implemented with *matplotlib*, we can generate interactive
sequence piles, where the user can not only visualize the temporal
relation of the different unconformities and faulting events, but also
modify it using intuitive drag and drop functionality (see figure
[fig:vtk]).

On top of these features, *GemPy* offers in-built 3-D visualization
based on the the open-source Visualization Toolkit
[C4]. It provides
users with an interactive 3-D view of the geological model, as well as
three additional orthogonal viewpoints (see figure [fig:vtk]). The user
can decide to plot just the data, the geological surfaces, or both. In
addition to just visualizing the data in 3-D, *GemPy* makes use of the
interaction capabilities provided by *vtk* to allow the user to move
input data points on the fly via drag-and-drop. Combined with
*GemPy*’s optimized modeling process (and the ability to use GPUs for
efficient model calculation), this feature allows for data modification
with real-time updating of the geological model (in the order of
milliseconds per scalar field). This functionality can not only improve
the understanding of the model but can also help the user to obtain the
desired outcome by working directly in 3-D space while getting direct
visual feedback on the modeling results. Yet, due to the exponential
increase of computational time with respect to the number of input data
and the model resolution), very large and complex models may have
difficulties to render fast enough to perceive continuity on
conventional computer systems.

For additional high quality visualization, we can generate vtk files
using *pyevtk*. These files can later be loaded into external VTK viewer
as Paraview [C5] in order to take
advantage of its intuitive interface and powerful visualization options.
Another natural compatibility exists with Blender
[C2] due to its use of Python as front-end.
Using the Python distribution shipped within a Blender installation, it
is possible to import, run and automatically represent *GemPy*’s data
and results (figure [fig:examples], see appendix [blender] for code
extension). This not only allow to render high quality images and videos
but also to visualize the models in Virtual Reality, making use of the
Blender Game engine and some of the plug-ins that enable this
functionality.

For sharing models, *GemPy* also includes functionality to upload
discretized models to the Steno 3D platform (a freemium business model).
Here, it is possible to visualize manipulate and shared the model with
any number of people effortless by simple invitations or the
distribution of a link.

In short, *Gempy* is not limited to a unique visualization library.
Currently *Gempy* gives support to many of the available visualization
options to fulfill the different needs of the developers accordingly.
However, these are not by all means the only possible alternatives and
in the future we expect that *GemPy* to be employed as backend of other
further projects.

## Gravity forward modeling¶

In recent years gravity measurements has increased in quality [C6] and is by now a valuable additional geophysical data source to support geological modeling. There are different ways to include the new information into the modeling workflow, and one of the most common is via inversions [C7]. Geophysics can validate the quality of the model in a probabilistic or optimization framework but also by back-propagating information, geophysics can improve automatically the modeling process itself. As a drawback, simulating forward geophysics adds a significant computational cost and increases the uncertainty to the parametrization of the model. However, due to the amount of uncorrelated information—often continuous in space—the inclusion of geophysical data in the modeling process usually becomes significant to evaluate the quality of a given model.

*GemPy* includes built-in functionality to compute forward gravity
conserving the automatic differentiation of the package. It is
calculated from the discretized block model applying the method of
[C8] for rectangular prisms in the
z direction,

where \(x\), \(y\), and \(z\) are the Cartesian components from the measuring point of the prism, \(r\) the euclidean distance and \(G_\rho\) the average gravity pull of the prism. This integration provides the gravitational pull of every voxel for a given density and distance in the component \(z\). Taking advantage of the immutability of the involved parameters with the exception of density allow us to precompute the decomposition of \(t_z\), leaving just its product with the weight \(G_\rho\)

as a recurrent operation.

As an example, we show here the forward gravity response of the geological model in figure [fig:model_comp]c. The first important detail is the increased extent of the interpolated model to avoid boundary errors. In general, a padding equal to the maximum distance used to compute the forward gravity computation would be the ideal value. In this example (figure [fig:gravity]) we l add 10\(\; \text{km}\) to the X and Y coordinates. The next step is to define the measurement 2-D grid—i.e. where to simulate the gravity response and the densities of each layers. The densities chosen are: 2.92, 3.1, 2.61 and 2.92\(\; \text{kg/m^3}\) for the basement, “Unconformity” layer (i.e. the layer on top of the unconformity), Layer 1 and Layer 2 respectively.

```
import matplotlib.pyplot as plt
import gempy as gp
# Main data management object containing. The extent must be large enough respect the forward gravity plane to account the effect of all cells at a given distance, $d$ to any spatial direction $x, y, z$.
geo_data = gp.create_data(extent=[-10,30,-10,20,-10,0],
resolution=[50,50,50],
path_o = "paper_Foliations.csv",
path_i = "paper_Points.csv")
# Defining the series of the sequential pile
gp.set_series(geo_data, series_distribution={'fault_serie1': 'fault1',
'younger_serie' : 'Unconformity',
'older_serie': ('Layer1', 'Layer2')},
order_formations= ['fault1', 'Unconformity', 'Layer2', 'Layer1'])
# Creating object with data prepared for interpolation and compiling.
interp_data = gp.InterpolatorData(geo_data, output='gravity')
# Setting the 2D grid of the airborn where we want to compute the forward gravity
gp.set_geophysics_obj(interp_data_g, ai_extent = [0, 20, 0, 10, -10, 0],
ai_resolution = [30,10])
# Making all possible precomputations: Decomposing the value tz for every point of the 2D grid to each voxel
gp.precomputations_gravity(interp_data_g, 25, densities=[2.92, 3.1, 2.61, 2.92])
# Computing gravity (Eq. 10)
lith, fault, grav = gp.compute_model(interp_data_g, 'gravity')
# Plotting lithology section
gp.plot_section(geo_data, lith[0], 0, direction='z',plot_data=True)
# Plotting forward gravity
plt.imshow(grav.reshape(10,30), cmap='viridis', origin='lower', alpha=0.8, extent=[0,20,0,10])
```

The computation of forward gravity is a required step towards a fully coupled gravity inversion. Embedding this step into a Bayesian inference allows to condition the initial data used to create the model to the final gravity response. This idea will be further developed in Section [sec:geol-invers-prob].

## Topology¶

The concept of topology provides a useful tool to describe adjacency
relations in geomodels, such as stratigraphic contacts or across-fault
connectivity
[C9][ Thiele:2016vg].
*GemPy* has in-built functionality to analyze the adjacency topology of
its generated models as Region Adjacency Graphs (RAGs), using the
`topology_compute`

method (see Listing 6). It can be directly
visualized on top of model sections (see figure [fig:topology]), where
each unique topological region in the geomodel is represented by a graph
node, and each connection as a graph edge. The function outputs the
graph object G, the region centroid coordinates, a list of all the
unique node labels, and two look-up tables to conveniently reference
node labels and lithologies

To analyze the model topology, *GemPy* makes use of a general connected
component labeling (CCL) algorithm to uniquely label all separated
geological entities in 3-D geomodels. The algorithm is provided via the
widely used, open-source, Python-based image processing library
*scikit-image* [C10] by the function
`skimage.measure.label`

, which is based on the optimized algorithms of
[C11]. But just using CCL
on a 3-D geomodel fails to discriminate a layer cut by a fault into two
unique regions because in practice both sides of a fault are represented
by the same label. To achieve the detection of edges across the fault,
we need to precondition the 3-D geomodel matrix, which contains just the
lithology information (layer id), with a 3-D matrix containing the
information about the faults (fault block id). This yields a 3-D matrix
which combines the lithology information and the fault block
information. This matrix can then be successfully labeled using CCL with
a 2-connectivity stamp, resulting in a new matrix of uniquely labeled
regions for the geomodel. From these, an adjacency graph is generated
using `skimage.future.graph.RAG`

, which created a Region Adjacency
Graph (RAG) of all unique regions contained in a 2-D or 3-D matrix,
representing each region with a node and their adjacency relations as
edges, successfully capturing the topology information of our geomodel.
The connections (edges) are then further classified into either
stratigraphic or across-fault edges, to provide further information. If
the argument `compute_areas=True`

was given, the contact area for the
two regions of an edge is automatically calculated (number of voxels)
and stored inside the adjacency graph.

```
...
Add Listing 3
...
# Computing result
lith, fault = gp.compute_model(interp_data)
# Compute topology
G, centroids, labels_unique, labels_lot, lith_lot = gp.topology_compute(geo_data, lith[0], fault[0], compute_areas=True)
# Plotting topology network
gp.plot_section(geo_data, lith[0], 5)
gp.topology_plot(geo_data, G, centroids)
```

## Stochastic Geomodeling and probabilistic programming¶

Raw geological data is noisy and measurements are usually sparse. As a result, geological models contain significant uncertainties [C12][Bardossy.2004][Lark:2013cj][Caers:2011jr][McLane:2008wz][Chatfield.1995] that must be addressed thoughtfully to reach a plausible level of confidence in the model. However, treating geological modeling stochastically implies many considerations: (i) from tens or hundreds of variables involved in the mathematical equations which ones should be latent?; (ii) can we filter all the possible outcomes which represent unreasonable geological settings? and (iii) how can we use other sources of data—especially geophysics—to improve the accuracy of the inference itself?

The answers to these questions are still actively debated in research and are highly dependent on the type of mathematical and computational framework chosen. In the interpolation method explained in this paper, the parameters suitable to behave as a latent variables (see figure [fig:overall] for an overview of possible stochastic parameters) could be the interface points \({\bf{x}}_{\alpha}\) (i.e. the 3 Cartesian coordinates \(x, \; y, \; z\)), orientations \({\bf{x}}_{\beta}\) (i.e. the 3 Cartesian coordinates \(x, \; y, \; z\) and the plane orientation normal \(Gx, \; Gy, \; Gz\)) or densities for the computation of the forward gravity. But not only parameters with physical meaning are suitable to be considered stochastic. Many mathematical parameters used in the kriging interpolation—such as: covariance at distance zero \(C_0\) (i.e. nugget effect) or the range of the covariance \(r\) (see Appendix [covariance-function-cubic.-discuss-it-with-france] for an example of a covariance function)—play a crucial role during the computation of the final models and, at best, are inferred by an educated guess to a greater or lesser extent [C13][Calcagno.2008]. To tackle this problem in a strict manner, it would be necessary to combine Bayesian statistics, information theory and sensitivity analysis among other expertises, but in essence all these methodologies begin with a probabilistic programming framework.

*GemPy* is fully designed to be coupled with probabilistic frameworks,
in particular with *pymc3* [C14] as both
libraries are based on *Theano*.

*pymc* is a series of Python libraries that provide intuitive tools to
build and subsequently infer complex probabilistic graphical models
[C15].
These libraries offer expressive and clean syntax to write and use
statistical distributions and different samplers. At the moment two main
libraries coexist due to their different strengths and weaknesses. On
the one hand, we have *pymc2* [C16]
written in FORTRAN and Python. *pymc2* does not allow gradient based
sampling methods, since it does not have automatic differentiation
capabilities. However, for that same reason, the model construction and
debugging is more accessible. Furthermore, not computing gradients
enables an easy integration with 3rd party libraries and easy
extensibility to other scientific libraries and languages. Therefore,
for prototyping and lower dimensionality problems—where the posterior
can be tracked by Metropolis-Hasting methods
[C17]–*pymc2* is still the go-to choice.

On the other hand the latest version, *pymc3*
[C14], allows the use of next generation
gradient-based samplers such as No-U-Turn Sampler

or Automatic Variational Inference

[C18]. These sampling methods are
proving to be a powerful tool to deal with multidimensional
problems—i.e. models with high number of uncertain parameters
[C19]. The weakness of these
methods are that they rely on the computation of gradients, which in
many cases cannot be manually derived. To circumvent this limitation
*pymc3* makes use of the AD capabilities of *Theano*. Being built on top
of *Theano* confer to the Bayesian inference process all the
capabilities discussed in section [theano] in exchange for the clarity
and flexibility that pure Python provides.

In this context, the purpose of *GemPy* is to fill the gap of complex
algebra between the prior data and observations, such as geophysical
responses (e.g. gravity or seismic inversions) or geological
interpretations (e.g. tectonics, model topologies). Since *GemPy* is
built on top of *Theano* as well, the compatibility with both libraries
is relatively straightforward. However, being able to encode most of the
conceivable probabilistic graphical models derived from, often, diverse
and heterogeneous data would be an herculean task. For this reason most
of the construction of the PGM has to be coded by the user using the
building blocks that the *pymc* packages offer (see listing 6). By doing
so, we can guarantee full flexibility and adaptability to the
necessities of every individual geological setting.

For this paper we will use *pymc2* for its higher readability and
simplicity. *pymc3* architecture is analogous with the major difference
that the PGM is constructed in *Theano*—and therefore symbolically (for
examples using *pymc3* and *GemPy* check the online documention detailed
in Appendix [sec:documentation]).

### Uncertainty Quantification¶

An essential aspect of probabilistic programming is the inherent capability to quantify uncertainty. Monte Carlo error propagation [C20] has been introduced in the field of geological modeling a few years ago [C12][Jessell.2010][Lindsay:2012gx], exploiting the automation of the model construction that implicit algorithms offer.

In this paper example (figure [fig:uncertainty]-Priors), we fit a normal distribution of standard deviation \(300 \,\)[m] around the Z axis of the interface points in initial model (figure [fig:model_comp] c). In other words, we allows to the interface points that define the model to oscillate independently along the axis Z accordingly randomly—using normal distributions—and subsequently we compute the geomodels that these new data describe.

The first step to the creation of a PGM is to define the parameters that
are supposed to be stochastic and the probability functions that
describe them. To do so, *pymc2* provides a large selection of
distributions as well as a clear framework to create custom ones. Once
we created the stochastic parameters we need to substitute the initial
value in the *GemPy* database (`interp_data`

in the snippets) for the
corresponding *pymc2* objects. Next, we just need to follow the usual
*GemPy* construction process—i.e. calling the `compute_model`

function—wrapping it using a deterministic *pymc2* decorator to describe
that these function is part of the probabilistic model (figure
[fig:PGM-prior]). After creating the graphical model we can sample from
the stochastic parameters using Monte Carlo sampling using *pymc2*
methods.

```
...
Add Listing 3
...
# Coping the initial data
geo_data_stoch_init = deepcopy(interp_data.geo_data_res)
# MODEL CONSTRUCTION
# ==================
# Positions (rows) of the data we want to make stochastic
ids = range(2,12)
# List with the stochastic parameters. pymc.Normal attributes: Name, mean, std
interface_Z_modifier = [pymc.Normal("interface_Z_mod_"+str(i), 0., 1./0.01**2) for i in ids]
# Modifing the input data at each iteration
@pymc.deterministic(trace=True)
def input_data(value = 0, interface_Z_modifier = interface_Z_modifier,
geo_data_stoch_init = geo_data_stoch_init,
ids = ids, verbose=0):
# First we extract from our original intep_data object the numerical data that
# is necessary for the interpolation. geo_data_stoch is a pandas Dataframe
geo_data_stoch = gp.get_data(geo_data_stoch_init, numeric=True)
# Now we loop each id which share the same uncertainty variable. In this case, each layer. We add the stochastic part to the initial value
for num, i in enumerate(ids):
interp_data.geo_data_res.interfaces.set_value(i, "Z",
geo_data_stoch_init.interfaces.iloc[i]["Z"] + interface_Z_modifier[num])
# Return the input data to be input into the modeling function. Due to the way pymc2
# stores the traces we need to save the data as numpy arrays
return interp_data.geo_data_res.interfaces[["X", "Y", "Z"]].values,
interp_data.geo_data_res.orientations[["X", "Y", "Z", "dip", "azimuth", "polarity"]].values
# Computing the geological model
@pymc.deterministic(trace=True)
def gempy_model(value=0, input_data=input_data, verbose=False):
# modify input data values accordingly
interp_data.geo_data_res.interfaces[["X", "Y", "Z"]] = input_data[0]
# Gx, Gy, Gz are just used for visualization. The Theano function gets azimuth dip and polarity!!!
interp_data.geo_data_res.orientations[["G_x", "G_y", "G_z", "X", "Y", "Z", 'dip', 'azimuth', 'polarity']] = input_data[1]
# Some iterations will give a singular matrix, that's why we need to
# create a try to not break the code.
try:
lb, fb, grav = gp.compute_model(interp_data, outup='gravity')
return lb, fb, grav
except np.linalg.linalg.LinAlgError as err:
# If it fails (e.g. some input data combinations could lead to
# a singular matrix and thus break the chain) return an empty model
# with same dimensions (just zeros)
if verbose:
print("Exception occured.")
return np.zeros_like(lith_block), np.zeros_like(fault_block), np.zeros_like(grav_i)
# Extract the vertices in every iteration by applying the marching cube algorithm
@pymc.deterministic(trace=True)
def gempy_surfaces(value=0, gempy_model=gempy_model):
vert, simp = gp.get_surfaces(interp_data, gempy_model[0][1], gempy_model[1][1],
original_scale=True)
return vert
# We add all the pymc objects to a list
params = [input_data, gempy_model, gempy_surfaces, *interface_Z_modifier]
# We create the pymc model i.e. the probabilistic graph
model = pymc.Model(params)
runner = pymc.MCMC(model)
# BAYESIAN INFERENCE
# ==================
# Number of iterations
iterations = 10000
# Inference. By default without likelihoods: Sampling from priors
runner.sample(iter=iterations, verbose=1)
```

The suite of possible realization of the geological model are stored, as traces, in a database of choice (HDF5, SQL or Python pickles) for further analysis and visualization.

In 2-D we can display all possible locations of the interfaces on a
cross-section at the center of the model (see figure
[fig:uncertainty]-Priors-2-D representation), however the extension of
uncertainty visualization to 3D is not as trivial. *GemPy* makes use of
the latest developments in uncertainty visualization for 3-D structural
geological modeling
[C21][Lindsay:2013cv][Lindsay:2013dr][Wellmann:2012wf].
The first method consists on representing the probability of finding a
given geological unit \(F\) at each discrete location in the model
domain. This can be done by defining a probability function

where n is the number of realizations and \(I_{F_k}(x)\) is a indicator function of the mentioned geological unit (figure [fig:uncertainty]-Probability shows the probability of finding Layer 1). However this approach can only display each unit individually. A way to encapsulate geomodel uncertainty with a single parameter to quantify and visualize it, is by applying the concept of information entropy [C22], based on the general concept developed by [C23]. For a discretized geomodel the information entropy \(H\) (normalized by the total number of voxels \(n\)) can be defined as

where \(p_F\) represents the probability of a layer at cell \(x\). Therefore, we can use information entropy to compress our uncertainty into a single value at each voxel as an indication of uncertainty, reflecting the possible number of outcomes and their relative probability (see figure [fig:uncertainty]-Entropy).

### Geological inversion: Gravity and Topology¶

Although computing the forward gravity has its own value for many
applications, the main aim of *GemPy* is to integrate all possible
sources of information into a single probabilistic framework. The use of
likelihood functions in a Bayesian inference in opposition to simply
rejection sampling has been explored by the authors during the recent
years
[C24][wellmann2017uncertainty][schaaf17master].
This approach enables to tune the conditioning of possible stochastic
realizations by varying the probabilistic density function used as
likelihoods. In addition, Bayesian networks allow to combine several
likelihood functions, generating a competition among the prior
distribution of the input data and likelihood functions resulting in
posterior distributions that best honor all the given information. To
give a flavor of what is possible, we apply custom likelihoods to the
previous example based on, topology and gravity constrains in an
inversion.

As, we have shown above, topological graphs can represent the connectivity among the segmented areas of a geological model. As is expected, stochastic perturbations of the input data can rapidly alter the configuration of mentioned graphs. In order to preserve a given topological configuration partially or totally, we can construct specific likelihood functions. To exemplify the use of a topological likelihood function, we will use the topology computed in the section [sec:topology] derived from the initial model realization (figure [fig:topology] or [fig:uncertainty]-Likelihoods) as “ideal topology”. This can be based on an expert interpretation of kinematic data or deduced from auxiliary data.

The first challenge is to find a metric that captures the similarity of two graphs. As a graph is nothing but a set of nodes and their edges we can compare the intersection and union of two different sets using the the Jaccard index [C25][Thiele:2016wx]. It calculates the ratio of intersection and union of two given graphs A and B:

The resulting ratio is zero for entirely different graphs, while the metric rises as the sets of edges and nodes become more similar between two graphs and reaches exactly one for an identical match. Therefore, the Jaccard index can be used to express the similarity of topology graphs as a single number we can evaluate using a probability density function. The type of probability density function used will determine the “strength” or likelihood that the mean graph represent. Here, we use a half Cauchy distribution (\(\alpha = 0\) and \(\beta = 10^{-3}\)) due to its tolerance to outliers.

```
...
Add Listing 6
...
# Computation of toplogy
@pymc.deterministic(trace=True)
def gempy_topo(value=0, gm=gempy_model, verbose=False):
G, c, lu, lot1, lot2 = gp.topology_compute(geo_data, gm[0][0], gm[1], cell_number=0, direction="y")
if verbose:
gp.plot_section(geo_data, gm[0][0], 0)
gp.topology_plot(geo_data, G, c)
return G, c, lu, lot1, lot2
# Computation of L2-Norm for the forward gravity
@pymc.deterministic
def e_sq(value = original_grav, model_grav = gempy_model[2], verbose = 0):
square_error = np.sqrt(np.sum((value*10**-7 - (model_grav*10**-7))**2))
return square_error
# Likelihoods
# ===========
@pymc.stochastic
def like_topo_jaccard_cauchy(value=0, gempy_topo=gempy_topo, G=topo_G):
"""Compares the model output topology with a given topology graph G using an inverse Jaccard-index embedded in a half-cauchy likelihood."""
# jaccard-index comparison
j = gp.Topology.compare_graphs(G, gempy_topo[0])
# the last parameter adjusts the "strength" of the likelihood
return pymc.half_cauchy_like(1 - j, 0, 0.001)
@pymc.observed
def inversion(value = 1, e_sq = e_sq):
return pymc.half_cauchy_like(e_sq,0,0.1)
# We add all the pymc objects to a list
params = [input_data, gempy_model, gempy_surfaces, gempy_topo, *interface_Z_modifier,
like_topo_jaccard_cauchy, e_sq, inversion]
# We create the pymc model i.e. the probabilistic graph
model = pymc.Model(params)
runner = pymc.MCMC(model)
# BAYESIAN INFERENCE
# ==================
# Number of iterations
iterations = 15000
# Inference. Adaptive Metropolis
runner.use_step_method(pymc.AdaptiveMetropolis, params, delay=1000)
runner.sample(iter = 20000, burn=1000, thin=20, tune_interval=1000, tune_throughout=True)
```

Gravity likelihoods exploit the spatial distribution of density which can be related to different lithotypes [C26]. To test the likelihood function based on gravity data, we first generate the synthetic “measured” data. This was done simply by computing the forward gravity for one of the extreme models (to highlight the effect that a gravity likelihood can have) generated during the Monte Carlo error propagation in the previous section. This model is particularly characteristic by its high dip values (figure [fig:uncertainty]-Syntetic model to produce forward gravity). Once we have an “observed” gravity, we can compare it to a simulated gravity response. To do so, we compare their values applying an L2-norm encapsulating the difference into a single error value. This error value acts as the input of the likelihood function, in this case, a half Cauchy (\(\alpha = 0\) and \(\beta = 10^{-1}\)). This probabilistic density function increases as we approach to 0 and at both extremes (very low or high values of error) the function flatters to accommodate to possible measurement errors.

As sampler we use an adaptive Metropolis method ([C17], for a more in depth explanation of samplers and their importance see [C24]). This method varies the metropolis sampling size according to the covariance function that gets updated every \(n\) iterations. For the results here exposed, we performed 20000 iterations, tuning the adaptive covariance every 1000 steps (a convergence analysis can be found in the Jupyter notebooks attached to the on-line supplement of this paper).

As a result of applying likelihood functions we can appreciate a clear
change in the posterior (i.e. the possible outcomes) of the inference. A
closer look shows two main zones of influence, each of them related to
one of the likelihood functions. On one hand, we observe a reduction of
uncertainty along the fault plane due to the restrictions that the
topology function imposes by conditioning the models to high Jaccard
values. On the other hand, what in the first example—i.e. Monte Carlo
error propagation—was just an outlier, due to the influence of the
gravity inversion, now it becomes the norm bending the layers
pronouncedly. In both cases, it is important to keep in mind that the
grade of impact into the final model is inversely proportional to the
amount of uncertainty that each stochastic parameter carries. Finally,
we would like to remind the reader that the goal of this example is not
to obtain realistic geological models but to serve as an example how the
in-built functionality of *GemPy* can be used to handle similar cases.

[C1] | (1, 2) John D Hunter. Matplotlib: a 2d graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007. |

[C2] | (1, 2) Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam, 2017. URL: http://www.blender.org. |

[C3] | Michael Waskom, Olga Botvinnik, Drew O’Kane, Paul Hobson, Saulius Lukauskas, David C Gemperline, Tom Augspurger, Yaroslav Halchenko, John B. Cole, Jordi Warmenhoven, Julian de Ruiter, Cameron Pye, Stephan Hoyer, Jake Vanderplas, Santi Villalba, Gero Kunter, Eric Quintero, Pete Bachant, Marcel Martin, Kyle Meyer, Alistair Miles, Yoav Ram, Tal Yarkoni, Mike Lee Williams, Constantine Evans, Clark Fitzgerald, Brian, Chris Fonnesbeck, Antony Lee, and Adel Qalieh. Mwaskom/seaborn: v0.8.1 (september 2017). September 2017. URL: https://doi.org/10.5281/zenodo.883859, doi:10.5281/zenodo.883859. |

[C4] | Will J Schroeder, Bill Lorensen, and Ken Martin. The visualization toolkit: an object-oriented approach to 3D graphics. Kitware, 2004. |

[C5] | Utkarsh Ayachit. The paraview guide: a parallel visualization application. 2015. |

[C6] | M Nabighian and others. 75 anniversary-the historical development of the gravity method in exploration. Geophysics, 2005. |

[C7] | A. Tarantola. Inverse problem theory and methods for model parameter estimation. Society for Industrial Mathematics, 2005. |

[C8] | Dezsö Nagy. The gravitational attraction of a right rectangular prism. Geophysics, 31(2):362–371, 1966. |

[C9] | Samuel T Thiele, Mark W Jessell, Mark Lindsay, Vitaliy Ogarko, J F Wellmann, and Evren Pakyuz-Charrier. The topology of geology 1: Topological analysis. Jorunal of Structural Geology, 91 IS -:27–38, October 2016. |

[C10] | Stefan Van der Walt, Johannes L Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D Warner, Neil Yager, Emmanuelle Gouillart, and Tony Yu. Scikit-image: image processing in python. PeerJ, 2:e453, 2014. |

[C11] | Christophe Fiorio and Jens Gustedt. Two linear time union-find strategies for image processing. Theoretical Computer Science, 154(2):165–181, 1996. |

[C12] | (1, 2) J F Wellmann, Franklin G. Horowitz, Eva Schill, and Klaus Regenauer-Lieb. Towards incorporating uncertainty of structural data in 3D geological inversion. Tectonophysics, 490(3-4):141–151, 2010. |

[C13] | Jean-Paul Chiles, Christophe Aug, Antonio Guillen, and T. Lees. Modelling of Geometry of Geological Units and its Uncertainty in 3D From Structural Data:~The Potential-Field Method. Perth, 2004. |

[C14] | (1, 2) John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2(2):e55, April 2016. |

[C15] | D Koller and N Friedman. Probabilistic graphical models: principles and techniques. 2009. |

[C16] | Anand Patil, David Huard, and Christopher J. Fonnesbeck. Pymc: bayesian stochastic modelling in python. J. Stat. Softw, pages 1–81, 2010. |

[C17] | (1, 2) H Haario, E Saksman, and J Tamminen. An adaptive metropolis algorithm. Bernoulli 7 223–242. Mathematical Reviews (MathSciNet): MR1828504 Digital Object Identifier: doi, 10:3318737, 2001. |

[C18] | Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David Blei. Automatic variational inference in stan. In Advances in neural information processing systems, 568–576. 2015. |

[C19] | Michael Betancourt, Simon Byrne, Sam Livingstone, Mark Girolami, and others. The geometric foundations of hamiltonian monte carlo. Bernoulli, 23(4A):2257–2298, 2017. |

[C20] | John F Ogilvie. A monte-carlo approach to error propagation. Computers & chemistry, 8(3):205–207, 1984. |

[C21] | Mark Lindsay, Laurent Ailleres, Mark W Jessell, Eric de Kemp, and Peter G Betts. Locating and quantifying geological uncertainty in three-dimensional models: Analysis of the Gippsland Basin, southeastern Australia. Tectonophysics, 546-547:10–27, April 2012. |

[C22] | J F Wellmann and Klaus Regenauer-Lieb. Uncertainties have a meaning: Information entropy as a quality measure for 3-D geological models. Tectonophysics, 526-529:207–216, March 2012. |

[C23] | E. C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 1948. |

[C24] | (1, 2) Miguel de la Varga and J F Wellmann. Structural geologic modeling as an inference problem: A Bayesian perspective. Interpretation, 4(3):1–16, June 2016. |

[C25] | Paul Jaccard. The distribution of the flora in the alpine zone. New phytologist, 11(2):37–50, 1912. |

[C26] | Michael Dentith and Stephen T Mudge. Geophysics for the mineral exploration geoscientist. Cambridge University Press, 2014. |