Introduction

We commonly capture our knowledge about relevant geological features in the subsurface in the form of geological models, as 3-D representations of the geometric structural setting. Computer-aided geo logical modeling methods have existed for decades, and many advanced and elaborate commercial packages exist to generate these models (e.g. GoCAD, Petrel, GeoModeller). But even though these packages partly enable an external access to the modeling functionality through implemented API’s or scripting interfaces, it is a significant disadvantage that the source code is not accessible, and therefore the true inner workings are not clear. More importantly still, the possibility to extend these methods is limited—and, especially in the current rapid development of highly efficient open-source libraries for machine learning and computational inference (e.g. TensorFlow, Stan, pymc, PyTorch, Infer.NET), the integration into other computational frameworks is limited.

Yet, there is to date no fully flexible open-source project which integrates state-of-the-art geological modeling methods. Conventional 3-D construction tools (CAD, e.g. pythonOCC, PyGem) are only useful to a limited extent, as they do not consider the specific aspects of subsurface structures and the inherent sparcity of data. Open source GIS tools exist (e.g. QGIS, gdal), but they are typically limited to 2-D (or 2.5-D) structures and do not facilitate the modeling and representation of fault networks, complex structures like overturned folds or dome structures), or combined stratigraphic sequences.

Example of models generated using GemPy. a) Synthetic model representing a reservoir trap, visualized in Paraview [A1]; b) Geological model of the Perth basin (Australia) rendered using GemPy on the in-built Python in Blender (see appendix [blender] for more details), spheres and cones represent the input data.

With the aim to close this gap, we present here GemPy, an open-source implementation of a modern and powerful implicit geological modeling method based on a potential-field approach, found, in turn, on a Universal CoKriging interpolation [A2][Calcagno.2008]. In distinction to surface-based modeling approaches see [A3] for a good overview, these approaches allow the direct interpolation of multiple conformal sequences in a single scalar field, and the consideration of discontinuities (e.g. metamorphic contacts, unconformities) through the interaction of multiple sequences [A2][Mallet.2004][Calcagno.2008][Caumon:2010jp][Hillier:2014gf]. Also, these methods allow the construction of complex fault networks and enable, in addition, a direct global interpolation of all available geological data in a single step. This last aspect is relevant, as it facilitates the integration of these methods into diverse other workflows. Most importantly, we show here how we can integrate the method into novel and advanced machine learning and Bayesian inference frameworks [A4] for stochastic geomodeling and Bayesian inversion. Recent developments in this field have seen a surge in new methods and frameworks, for example using gradient-based Monte Carlo methods [A5] or variational inferences [A6], making use of efficient implementations of automatic differentiation [A7] in novel machine learning frameworks. For this reason, GemPy is built on top of Theano, which provides not only the mentioned capacity to compute gradients efficiently, but also provides optimized compiled code (for more details see Section [theano]). In addition, we utilize pandas for data storage and manipulation [A8], Visualization Toolkit (vtk) Python-bindings for interactive 3-D visualization [A9], the de facto standard 2-D visualization library Matplotlib [A10] and NumPy for efficient numerical computations [A11]. Our implementation is specifically intended for combination with other packages, to harvest efficient implementations in the best possible way.

Especially in this current time of rapid developments of open-source scientific software packages and powerful machine learning frameworks, we consider an open-source implementation of a geological modeling tool as essential. We therefore aim to open up this possibility to a wide community, by combining state-of-the-art implicit geological modeling techniques with additional sophisticated Python packages for scientific programming and data analysis in an open-source ecosystem. The aim is explicitly not to rival the existing commercial packages with well-designed graphical user interfaces, underlying databases, and highly advanced workflows for specific tasks in subsurface engineering, but to provide access to an advanced modeling algorithm for scientific experiments in the field of geomodeling.

In the following, we will present the implementation of our code in the form of core modules, related to the task of geological modeling itself, and additional assets, which provide the link to external libraries, for example to facilitate stochastic geomodeling and the inversion of structural data. Each part is supported/ supplemented with Jupyter Notebooks that are available as additional online material and part of the package documentation, which enable the direct testing of our methods (see Section  [sec:jupyter-notebooks]). These notebooks can also be executed directly in an online environment (Binder). We encourage the reader to use these tutorial Jupyter Notebooks to follow along the steps explained in the following. We encourage the reader to use these tutorial Jupyter Notebooks to follow along the steps explained in the following. Finally, we discuss our approach, specifically also with respect to alternative modeling approaches in the field, and provide an outlook to our planned future work for this project.

[A1]Fabian A Stamm. Bayesian decision theory in structural geological modeling - how reducing uncertainties affects reservoir value estimations. Master’s thesis, RWTH Aachen University, Aachen, Germany, 2017.
[A2](1, 2) Christian Lajaunie, Gabriel Courrioux, and Laurent Manuel. Foliation fields and 3D cartography in geology: Principles of a method based on potential interpolation. Mathematical Geology, 29(4):571–584, 1997.
[A3]G. Caumon, P. Collon-Drouaillet, C. Le Carlier de Veslud, S. Viseur, and J. Sausse. Surface-Based 3D Modeling of Geological Structures. Mathematical Geosciences, 41(8):927–945, 2009.
[A4]John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2(2):e55, April 2016.
[A5]Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid monte carlo. Physics letters B, 195(2):216–222, 1987.
[A6]Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. arXiv preprint arXiv:1603.00788, 2016.
[A7]Louis B Rall. Automatic differentiation: techniques and applications. 1981.
[A8]Wes McKinney. Pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing, pages 1–9, 2011.
[A9]Will J Schroeder, Bill Lorensen, and Ken Martin. The visualization toolkit: an object-oriented approach to 3D graphics. Kitware, 2004.
[A10]John D Hunter. Matplotlib: a 2d graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007.
[A11]Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22–30, 2011.