# CORE - Geological modeling with GemPy¶

In this section, we describe the core functionality of *GemPy*: the
construction of 3-D geological models from geological input data
(surface contact points and orientation measurements) and defined
topological relationships (stratigraphic sequences and fault networks).
We begin with a brief review of the theory underlying the implemented
interpolation algorithm. We then describe the translation of this
algorithm and the subsequent model generation and visualisation using
the Python front-end of *GemPy* and how an entire model can be
constructed by calling only a few functions. Across the text, we include
code snippets with minimal working examples to demonstrate the use of
the library.

After describing the simple functionality required to construct models,
we go deeper into the underlying architecture of *GemPy*. This part is
not only relevant for advanced users and potential developers, but also
highlights a key aspect: the link to *Theano*
[B1], a highly evolved Python library for
efficient vector algebra and machine learning, which is an essential
aspect required for making use of the more advanced aspects of
stochastic geomodeling and Bayesian inversion, which will also be
explained in the subsequent sections.

## Geological modeling and the potential-field approach¶

### Concept of the potential-field method¶

The potential-field method developed by
[B2] is the central method to generate the
3D geological models in *GemPy*, which has already been successfully
deployed in the modeling software GeoModeller 3-D see
[B3]. The general idea is to
construct an interpolation function \(\textbf{Z}({\bf{x}}_{0})\)
where \(\text{x}\) is any point in the continuous three-dimensional
space (\(x, y, z\)) \(\in \mathbb{R}^3\) which describes the
domain \(\mathcal{D}\) as a scalar field. The gradient of the scalar
field will follow the direction of the anisotropy of the stratigraphic
structure or, in other words, every possible isosurface of the scalar
field will represent every synchronal deposition of the layer (see
figure [fig:potfield]).

Let’s break down what we actually mean by this: Imagine that a
geological setting is formed by a perfect sequence of horizontal layers
piled one above the other. If we know the exact timing of when one of
these surfaces was deposited, we would know that any layer above had to
occur afterwards while any layer below had to be deposited earlier in
time. Obviously, we cannot have data for each of these infinitesimal
synchronal layers, but we can interpolate the “date” between them. In
reality, the exact year of the synchronal deposition is
meaningless—since the related uncertainty would be out of proportion.
What has value to generate a 3D geomodel is the location of those
synchronal layers and especially the lithological interfaces where the
change of physical properties are notable. Due to this, instead
interpolating *time*, we use a simple dimensionless parameter—that we
simply refer to as *scalar field value*.

The advantages of using a global interpolator instead of interpolating each layer of interest independently are twofold: (i) the location of one layer affects the location of others in the same depositional environment, making impossible for two layers in the same potential field to cross; and (ii) it enables the use of data in-between the interfaces of interest, opening the range of possible measurements that can be used in the interpolation.

The interpolation function is obtained as a weighted interpolation based on Universal CoKriging [B4]. Kriging or Gaussian process regression [B5] is a spatial interpolation that treats each input as a random variable, aiming to minimize the covariance function to obtain the best linear unbiased predictor [B6]. Furthermore, it is possible to combine more than one type of data—i.e. a multivariate case or CoKriging—to increase the amount of information in the interpolator, as long as we capture their relation using a cross-covariance. The main advantage in our case is to be able to utilize data sampled from different locations in space for the estimation. Simple Kriging, as a regression, only minimizes the second moment of the data (or variances). However in most geological settings, we can expect linear trends in our data—i.e. the mean thickness of a layer varies across the region linearly. This trend is captured using polynomial drift functions to the system of equations in what is called Universal Kriging.

### Adjustments to structural geological modeling¶

So far we have shown what we want to obtain and how Universal CoKriging
is a suitable interpolation method to get there. In the following, we
will describe the concrete steps from taking our input data to the final
interpolation function \(\textbf{Z}({\bf{x}}_{0})\), which describes
the domain. Much of the complexity of the method comes from the
difficulty of keeping highly nested nomenclature consistent across
literature. For this reason, we will try to be especially verbose
regarding the mathematical terminology. The terms of *potential field*
(original coined by [B2]) and *scalar
field* (preferred by the authors) are used interchangeably across the
paper. The result of a Kriging interpolation is a random function and
hence both *interpolation function* and *random function* are used to
refer the function of interest \(\textbf{Z}({\bf{x}}_{0})\). The
CoKriging nomenclature quickly grows complicated, since it has to
consider *p* random functions \(\bf{Z}_{\it{i}}\), with \(p\)
being the number of distinct parameters involved in the interpolation,
sampled at different points \(\bf{x}\) of the three-dimensional
domain \(\mathbb{R}^3\). Two types of parameters are used to
characterize the *scalar field* in the interpolation: (i) layer
interface points \({\bf{x}}_{\alpha}\) describing the respective
isosurfaces of interest—usually the interface between two layers; and
(ii) the gradients of the scalar field, \({\bf{x}}_{\beta}\)—or in
geological terms: poles of the layer, i.e. normal vectors to the dip
plane. Therefore gradients will be oriented perpendicular to the
isosurfaces and can be located anywhere in space. We will refer to the
main random function—the scalar field itself—\({\bf{Z}}_{\alpha}\)
simply as \(\bf{Z}\), and its set of samples as
\({\bf{x}}_{\alpha}\) while the second random function
\({\bf{Z}}_{\beta}\)—the gradient of the scalar field—will be
referred to as \(\partial {\bf{Z}}/ \partial u\) and its samples as
\({\bf{x}}_{\beta}\), so that we can capture the relationship
between the potential field \(\bf{Z}\) and its gradient as

It is also important to keep the values of every individual synchronal layer identified since they have the same scalar field value. Therefore, samples that belong to a single layer \(k\) will be expressed as a subset denoted using superscript as \({\bf{x}}_\alpha ^k\) and every individual point by a subscript, \({\bf{x}}_{\alpha \, i}^k\) (see figure [fig:potfield]).

Note that in this context data does not have any meaningful physical parameter associated with it that we want to interpolate as long as stratigraphic deposition follows gradient direction. Therefore the two constraints we want to conserve in the interpolated scalar field are: (i) all points belonging to a determined interface \({\bf{x}}_{\alpha \, i}^k\) must have the same scalar field value (i.e. there is an isosurface connecting all data points)

where \({\bf{x}}_{\alpha_\, 0}^k\) is a reference point of the interface and (ii) the scalar field will be perpendicular to the poles \({\bf{x}}_{\beta}\) anywhere in 3-D space.

Considering equation [eq_rel], we do not care about the exact value at
\({{\bf{Z}}(\bf{x}}_{\alpha_\, i}^k)\) as long as it is constant at
all points \({\bf{x}}_{\alpha_\, i}^k\). Therefore, the random
function **Z** in the CoKriging system (equation [krig_sys]) can be
substituted by equation [eq_rel]. This formulation entails that the
specific *scalar field values* will depend only on the gradients and
hence at least one gradient is necessary to keep the system of equations
defined. The reason for this formulation rest on that by not fixing the
values of each interface \({{\bf{Z}}( \bf{x}}_{\alpha}^k )\), the
compression of layers—which is derived by the gradients—can propagate
smoother beyond the given interfaces. Otherwise, the gradients will only
have effect in the area within the boundaries of the two interfaces that
contains the variable.

The algebraic dependency between **Z** and
\(\partial {\bf{Z}}/ \partial u\) (equation [eq_der]) gives a
mathematical definition of the relation between the two variables
avoiding the need of an empirical cross-variogram, enabling instead the
use of the derivation of the covariance function. This dependency must
be taken into consideration in the computation of the drift of the first
moment as well having a different function for each of the variables

where \(F_1\) is a the polynomial of degree \(n\) and \(F_2\) its derivative. Having taken this into consideration, the resulting CoKriging system takes the form of:

where, \({\bf{C_{\partial {\bf{Z}}/ \partial u}}}\) is the gradient covariance-matrix; \({\bf{C_{\bf{Z}, \, \bf{Z}}}}\) the covariance-matrix of the differences between each interface points to reference points in each layer

(see Appendix [interface-covariance-matrix] for further analysis); \({\bf{C_{Z, \, \partial {\bf{Z}}/ \partial u }}}\) encapsulates the cross-covariance function; and \({\bf{U_{Z}}}\) and \(\bf{U'_{\partial {\bf{Z}}/ \partial u}}\) are the drift functions and their gradient, respectively. On the right hand side we find the vector of the matrix system of equations, being :math:`{bf{c_{partial {bf{Z}}/ partial u, , partial {bf{Z}}/ partial

v}}}` the gradient of the covariance function to the pointx

of interest; \({\bf{c_{Z, \,\partial {\bf{Z}}/ \partial u}}}\) the cross-covariance; \({\bf{c_{\bf{Z}, \,\bf{Z}}}}\) the actual covariance function; and \({\bf{f_{10}}}\) and \({\bf{f_{20}}}\) the gradient of the drift functions and the drift functions themselves. Lastly, the unknown vectors are formed by the corresponding weights, \(\lambda\), and constants of the drift functions, \(\mu\). A more detail inspection of this system of equations is carried out in Appendix [kse].

As we can see in equation [krig_sys], it is possible to solve the
Kriging system for the scalar field **Z** (second column in the weights
vector), as well as its derivative \(\partial {\bf{Z}}/ \partial u\)
(first column in the weights vector). Even though the main goal is the
segmentation of the layers, which is done using the value of **Z** (see
Section [from-potential-field-to-block]), the gradient of the scalar
field can be used for further mathematical applications, such as
meshing, geophysical forward calculations or locating geological
structures of interest (e.g. spill points of a hydrocarbon trap).

Furthermore, since the choice of covariance parameters is ad hoc (Appendix [covariance-function-cubic.-discuss-it-with-france] show the used covariance function in GemPy), the uncertainty derived by the Kriging interpolation does not bear any physical meaning. This fact promotes the idea of only using the mean value of the Kriging solution. For this reason it is recommended to solve the Kriging system (equation [krig_sys]) in its dual form [B5].

## Geological model interpolation using *GemPy*¶

### From scalar field to geological block model¶

In most scenarios the goal of structural modeling is to define the spatial distribution of geological structures, such as layers interfaces and faults. In practice, this segmentation usually is done either by using a volumetric discretization or by depicting the interfaces as surfaces.

The result of the Kriging interpolation is the random function \(\textbf{Z}(x)\) (and its gradient \(\partial {\bf{Z}} / \partial u (x)\), which we will omit in the following), which allows the evaluation of the value of the scalar field at any given point \(x\) in space. From this point on, the easiest way to segment the domains is to discretize the 3-D space (e.g. we use a regular grid in figure [fig:model_comp]). First, we need to calculate the scalar value at every interface by computing \({{\bf{Z}}( \bf{x}}_{\alpha, i}^k)\) for every interface \(k_i\). Once we know the value of the scalar field at the interfaces, we evaluate every point of the mesh and compare their value to those at the interfaces, identifying every point of the mesh with a topological volume. Each of these compartmentalizations will represent each individual domain, this is, each lithology of interest (see figure [fig:model_comp]a).

At the time of this manuscript preparation, *GemPy* only provides
rectilinear grids but it is important to notice that the computation of
the scalar field happens in continuous space, and therefore allows the
use of any type of mesh. The result of this type of segmentation is
referred in *GemPy* as a *lithology block*.

The second segmentation alternative consist on locating the layer
isosurfaces. *GemPy* makes use of the marching cube algorithm
[B7] provided by the *scikit-image*
library [B8]. The basics of the marching
cube algorithm are quite intuitive: (i) First, we discretize the volume
in 3-D voxels and by comparison we look if the value of the isosurface
we want to extract falls within the boundary of every single voxel; (ii)
if so, for each edge of the voxel, we interpolate the values at the
corners of the cube to obtain the coordinates of the intersection
between the edges of the voxels and the isosurface of interest, commonly
referred to as vertices; (iii) those intersections are analyzed and
compared against all possible configurations to define the simplices
(i.e. the vertices which form an individual polygon) of the triangles.
Once we obtain the coordinates of vertices and their correspondent
simplices, we can use them for visualization (see Section [vis]) or any
sub-sequential computation that may make use of them (e.g. weighted
voxels). For more information on meshing algorithms refer to
[B9].

```
import gempy as gp
# Main data management object containing
geo_data = gp.create_data(extent=[0, 20, 0, 10, -10, 0],
resolution=[100, 10, 100],
path_o="paper_Foliations.csv",
path_i="paper_Points.csv")
# Creating object with data prepared for interpolation and compiling
interp_data = gp.InterpolatorData(geo_data)
# Computing result
lith, fault = gp.compute_model(interp_data)
# Plotting result: scalar field
gp.plot_scalar_field(geo_data, lith[1], 5, plot_data=True)
# Plotting result: lithology block
gp.plot_section(geo_data, lith[0], 5, plot_data=True)
# Getting vertices and faces
vertices, simpleces = gp.get_surfaces(interp_data, lith[1], [fault[1]], original_scale=True)
```

### Combining scalar fields: Depositional series and faults¶

In reality, most geological settings are formed by a concatenation of depositional phases partitioned by unconformity boundaries and subjected to tectonic stresses which displace and deform the layers. While the interpolation is able to represent realistic folding—given enough data—the method fails to describe discontinuities. To overcome this limitation, it is possible to combine several scalar fields to recreate the desired result.

So far the implemented discontinuities in *GemPy* are unconformities and
infinite faults. Both types are computed by specific combinations of
independent scalar fields. We call these independent scalar fields
*series*
from stratigraphic series in accordance to the use in GeoModeller 3-D]
and in essence, they represent a subset of grouped interfaces—either
layers or fault planes—that are interpolated together and therefore
their spatial location affect each other. To handle and visualize these
relationships, we use a so called sequential pile; representing the
order—from the first scalar field to the last—and the grouping of the
layers (see figure [fig:model_comp]).

Modeling unconformities is rather straightforward. Once we have grouped the layers into their respective series, younger series will overlay older ones beyond the unconformity. The scalar fields themselves, computed for each of these series, could be seen as a continuous depositional sequence in the absence of an unconformity.

```
import gempy as gp
# Main data management object containing
geo_data = gp.create_data(extent=[0, 20, 0, 10, -10, 0],
resolution=[100, 10, 100],
path_o="paper_Foliations.csv",
path_i="paper_Points.csv")
# Defining the series of the sequential pile
gp.set_series(geo_data,
{'younger_serie' : 'Unconformity', 'older_serie': ('Layer1', 'Layer2')},
order_formations= ['Unconformity', 'Layer2', 'Layer1'])
# Creating object with data prepared for interpolation and compiling
interp_data = gp.InterpolatorData(geo_data)
# Computing result
lith, fault = gp.compute_model(interp_data)
# Plotting result
gp.plot_section(geo_data, lith[0], 5, plot_data=True)
```

Faults are modeled by the inclusion of an extra drift term into the kriging system [B10]:

which is a function of the faulting structure. This means that for every location \({\bf{x}}_{0}\) the drift function will take a value depending on the fault compartment—i.e. a segmented domain of the fault network—and other geometrical constrains such as spatial influence of a fault or variability of the offset. To obtain the offset effect of a fault, the value of the drift function has to be different at each of its sides. The level of complexity of the drift functions will determine the quality of the characterization as well as its robustness. Furthermore, finite or localize faults can be recreated by selecting an adequate function that describe those specific trends.

```
import gempy as gp
# Main data management object containing
geo_data = gp.create_data(extent=[0,20,0,10,-10,0],
resolution=[100,10,100],
path_o = "paper_Foliations.csv",
path_i = "paper_Points.csv")
# Defining the series of the sequential pile
gp.set_series(geo_data, series_distribution={'fault_serie1': 'fault1',
'younger_serie' : 'Unconformity',
'older_serie': ('Layer1', 'Layer2')},
order_formations= ['fault1', 'Unconformity', 'Layer2', 'Layer1'])
# Creating object with data prepared for interpolation and compiling
interp_data = gp.InterpolatorData(geo_data)
# Computing result
lith, fault = gp.compute_model(interp_data)
# Plotting result
gp.plot_section(geo_data, lith[0], 5, plot_data=True)
# Getting vertices and faces and pltting
vertices, simpleces = gp.get_surfaces(interp_data,lith[1], [fault[1]], original_scale=True)
gp.plot_surfaces_3D(geo_data, ver_s, sim_s)
```

The computation of the segmentation of fault compartments (called *fault
block* in *GemPy*)—prior to the inclusion of the fault drift functions
which depends on this segmentation—can be performed with the
potential-field method itself. In the case of multiple faults,
individual drift functions have to be included in the kriging system for
each fault, representing the subdivision of space that they produce.
Naturally, younger faults may offset older tectonic events. This
behavoir is replicated by recursively adding drift functions of younger
faults to the computation of the older *fault blocks*. To date, the
fault relations—i.e. which faults offset others—is described by the user
in a boolean matrix. An easy to use implementation to generate fault
networks is being worked on at the time of manuscript preparation. An
important detail to consider is that drift functions will bend the
isosurfaces according to the given rules but they will conserve their
continuity. This differs from the intuitive idea of offset, where the
interface presents a sharp jump. This fact has direct impact in the
geometry of the final model, and can, for example, affect certain
meshing algorithms. Furthermore, in the ideal case of choosing the
perfect drift function, the isosurface would bend exactly along the
faulting plane. At the current state *GemPy* only includes the addition
of an arbitrary integer to each segmented volume. This limits the
quality to a constant offset, decreasing the sharpness of the offset as
data deviates from that constrain. Any deviation from this theoretical
concept, results in a bending of the layers as they approximate the
fault plane to accommodate to the data, potentially leading to overly
smooth transitions around the discontinuity.

## “Under the hood”: The *GemPy* architecture¶

### The graph structure¶

The architecture of *GemPy* follows the Python Software Foundation
recommendations of modularity and reusability
[B11]. The aim is to divide all functionality
into independent small logical units in order to avoid duplication,
facilitate readability and make changes to the code base easier.

The design of *GemPy* revolves around an automatic differentiation (AD)
scheme. The main constraint is that the mathematical functions need to
be continuous from the input parameters (in probabilistic jargon priors)
to the cost function (or likelihoods), and therefore the code must be
written in the same language (or at the very least compatible) to
automatically compute the derivatives. In practice, this entails that
any operation involved in the AD must be coded symbolically using the
library *Theano* (see Section [theano] for further details). One of the
constrains of writing symbolically is the a priori declaration of the
possible input parameters of the graph which will behave as latent
variables—i.e. the parameters we try to tune for optimization or
uncertainty quantification—while leaving others involved parameters
constant either due to their nature or because of the relative slight
impact of their variability. This rigidity dictates the whole design of
input data management that needs to revolved around the preexistent
symbolic graph.

*GemPy* encapsulates this creation of the symbolic graph in its the
module `theanograph`

. Due to the significant complexity to program
symbolically, features shipped in *GemPy* that rely heavily in external
libraries are not written in *Theano*. The current functionality written
in *Theano* can be seen in the figure [fig:overall] and essentially it
encompasses all the interpolation of the geological modeling (section
[kriging]) as well as forward calculation of the gravity (section
[gravity]).

Regarding data structure, we make use of the Python package *pandas*
[B12] to store and prepare the input
data for the symbolic graph (red nodes in figure [fig:overall]) or other
processes, such as visualization. All the methodology to create, export
and manipulate the original data is encapsulated in the class
`DataManagement`

. This class has several child classes to facilitate
specific precomputation manipulations of data structures (e.g. for
meshing). The aim is to have all constant data prepared before any
inference or optimization is carried out to minimize the computational
overhead of each iteration as much as possible.

It is important to keep in mind that, in this structure, once data
enters the part of the symbolic graph, only algebraic operations are
allowed. This limits the use of many high-level coding structures (e.g.
dictionaries or undefined loops) and external dependencies. As a result
of that, the preparation of data must be exhaustive before starting the
computation. This includes ordering the data within the arrays, passing
the exact lengths of the subsets we will need later on during the
interpolation or the calculation of many necessary constant parameters.
The preprocessing of data is done within the sub-classes of
`DataManagement`

, the `InterpolatorData`

class–of which an instance
is used to call the *Theano* graph—and `InterpolatorClass`

—which
creates the the *Theano* variables and compiles the symbolic graph.

The rest of the package is formed by—an always growing—series of modules that perform different tasks using the geological model as input (see Section [sec:model-analys-furth] and the assets-area in figure [fig:overall]).

### Theano¶

Efficiently solving a large number of algebraic equations, and
especially their derivatives, can easily get unmanageable in terms of
both time and memory. Up to this point we have referenced many times
*Theano* and its related terms such as automatic differentiation or
symbolic programming. In this section we will motivate its use and why
its capabilities make all the difference in making implicit geological
modeling available for uncertainty analysis.

*Theano* is a Python package that takes over many of the optimization
tasks in order to create a computationally feasible code implementation.
*Theano* relies on the creation of symbolical graphs that represent the
mathematical expressions to compute. Most of the extended programming
paradigms e.g. procedural languages and
object-oriented programming; see [B13] are executed
sequentially without any interaction with the subsequent instructions.
In other words, a later instruction has access to the memory states but
is clueless about the previous instructions that have modified mentioned
states. In contrast, symbolic programming define from the beginning to
the end not only the primary data structure but also the complete logic
of a function , which in turn enables the optimization (e.g. redundancy)
and manipulation (e.g. derivatives) of its logic.

Within the Python implementation, *Theano* create an acyclic network
graph where the parameters are represented by nodes, while the
connections determine mathematical operators that relate them. The
creation of the graph is done in the class `theanograph`

. Each
individual method corresponds to a piece of the graph starting from the
input data all the way to the geological model or the forward gravity
(see figure [fig:overall], purple Theano area).

The symbolic graph is later analyzed to perform the optimization, the symbolic differentiation and the compilation to a faster language than Python (C or CUDA). This process is computational demanding and therefore it must be avoided as much as possible.

Among the most outstanding optimizers shipped with *Theano*
[B1], we
can find : (i) the canonicalization of the operations to reduce the
number of duplicated computations, (ii) specialization of operations to
improve consecutive element-wise operations, (iii) in-place operations
to avoid duplications of memory or (iv) Open MP parallelization for CPU
computations. These optimizations and more can speed up the code an
order of magnitude.

However, although *Theano* code optimization is useful, the real
game-changer is its capability to perform automatic differentiation.
There is extensive literature explaining all the ins and outs and
intuitions of the method since it is a core algorithm to train neural
networks e.g. a detailed explanation is given by
[B14].
Here, we will highlight the main differences with numerical approaches
and how they can be used to improve the modeling process.

Many of the most advanced algorithms in computer science rely on an inverse framework i.e. the result of a forward computation \(f(\textbf{x})\) influences the value of one or many of the \(\textbf{x}\) latent variables (e.g. neuronal networks, optimizations, inferences). The most emblematic example of this is the optimization of a cost function. All these problems can be described as an exploration of a multidimensional manifold \(f: \mathbb{R}^N \rightarrow \mathbb{R}\). Hence the gradient of the function \(\nabla f = \left(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}\right)\) becomes key for an efficient analysis. In case that the output is also multidimensional—i.e. \(f: \mathbb{R}^N \rightarrow \mathbb{R}^M\)—the entire manifold gradient can be expressed by the Jacobian matrix

of dimension \(N \cdot M\), where \(N\) is the number of variables and \(M\) the number of functions that depend on those variables. Now the question is how we compute the Jacobian matrix in a consistent and efficient manner. The most straightforward methodology consists in approximating the derivate by numerical differentiation applying finite differences approximations, for example a forward FD scheme:

where \(h\) is a discrete increment. The main advantage of numerical differentiation is that it only computes \(f\)—evaluated for different values of \(x\)—which makes it very easy to implement it in any available code. By contrast, a drawback is that for every element of the Jacobian we are introducing an approximation error that eventually can lead to mathematical instabilities. But above all, the main limitation is the need of \(2 \cdot M \cdot N\) evaluations of the function \(f\), which quickly becomes prohibitively expensive to compute in high-dimensional problems.

The alternative is to create the symbolic differentiation of \(f\). This encompasses decomposing \(f\) into its primal operators and applying the chain rule to the correspondent transformation by following the rules of differentiation to obtain \(f'\). However, symbolic differentiation is not enough since the application of the chain rule leads to exponentially large expressions of \(f'\) in what is known as “expression swell” [B15]. Luckily, these large symbolic expressions have a high level of redundancy in their terms. This allows to exploit this redundancy by storing the repeated intermediate steps in memory and simply invoke them when necessary, instead of computing the whole graph every time. This division of the program into sub-routines to store the intermediate results—which are invoked several times—is called dynamic programming [B16]. The simplified symbolic differentiation graph is ultimately what is called automatic differentiation [B14]. Additionally, in a multivariate/multi-objective case the benefits of using AD increase linearly as the difference between the number of parameters \(N\) and the number of objective functions \(M\) get larger. By applying the same principle of redundancy explained above—this time between intermediate steps shared across multiple variables or multiple objective function—it is possible to reduce the number of evaluations necessary to compute the Jacobian either to \(N\) in forward-propagation or to \(M\) in back-propagation, plus a small overhead on the evaluations for a more detailed description of the two modes of AD see [B15].

*Theano* provides a direct implementation of the back-propagation
algorithm, which means in practice that a new graph of similar size is
generated per cost function (or, in the probabilistic inference, per
likelihood function). Therefore, the computational time is independent
of the number of input parameters, opening the door to solving
high-dimensional problems.

[B1] | (1, 2) Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, May 2016. URL: http://arxiv.org/abs/1605.02688. |

[B2] | (1, 2) Christian Lajaunie, Gabriel Courrioux, and Laurent Manuel. Foliation fields and 3D cartography in geology: Principles of a method based on potential interpolation. Mathematical Geology, 29(4):571–584, 1997. |

[B3] | Philippe Calcagno, Jean-Paul Chiles, Gabriel Courrioux, and Antonio Guillen. Geological modelling from field data and geological knowledge: Part I. Modelling method coupling 3D potential-field interpolation and geological rules: Recent Advances in Computational Geodynamics: Theory, Numerics and Applications. Physics of the Earth and Planetary Interiors, 171(1-4):147–157, 2008. |

[B4] | Jean-Paul Chiles and Pierre Delfiner. Geostatistics: modeling spatial uncertainty. Volume 497. John Wiley & Sons, 2009. |

[B5] | (1, 2) Georges Matheron. Splines and kriging: their formal equivalence. Down-to-earth-statistics: Solutions looking for geological problems, pages 77–95, 1981. |

[B6] | Hans Wackernagel. Multivariate geostatistics: an introduction with applications. Springer Science & Business Media, 2013. |

[B7] | William E Lorensen and Harvey E Cline. Marching cubes: a high resolution 3d surface construction algorithm. In ACM siggraph computer graphics, volume 21, 163–169. ACM, 1987. |

[B8] | Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu, and the scikit-image contributors. Scikit-image: image processing in Python. PeerJ, 2:e453, 6 2014. URL: http://dx.doi.org/10.7717/peerj.453, doi:10.7717/peerj.453. |

[B9] | Christophe Geuzaine and Jean-François Remacle. Gmsh: a 3-d finite element mesh generator with built-in pre-and post-processing facilities. International journal for numerical methods in engineering, 79(11):1309–1331, 2009. |

[B10] | Alain Marechal. Kriging seismic data in presence of faults. In Geostatistics for natural resources characterization, pages 271–294. Springer, 1984. |

[B11] | Guido van Rossum, Barry Warsaw, and Nick Coghlan. Pep 8: style guide for python code. Python. org, 2001. |

[B12] | Wes McKinney. Pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing, pages 1–9, 2011. |

[B13] | Kurt Normark. Overview of the four main programming paradigms. 2013. |

[B14] | (1, 2) Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. arXiv preprint arXiv:1502.05767, 2015. |

[B15] | (1, 2) Joel S Cohen. Computer algebra and symbolic computation: Mathematical methods. Universities Press, 2003. |

[B16] | Richard Bellman. Dynamic programming. Courier Corporation, 2013. |