Scientific Coding: Tools You Might Not Know About

Ellen M. Price

Pizza Ponder, 14 August 2018

Prerequisites

Ground rules

  • Throughout this talk, I'll make lots of recommendations, but it's entirely possible I've left out your favorite tool! I apologize in advance.
  • If you think I've forgotten something important, chime in!
  • And please ask questions if something isn't clear.

Disclaimer

Much of what I'll discuss today requires downloading and building custom software because the CF doesn't provide it. This is typically fine, because we don't have the privileges necessary to do any real harm. But if you feel uncomfortable about it, just ask them first!

Basic software

To make any of this work, you'll probably need...

  • A modern compiler (gcc)
  • A modern Python setup (use 2.7 or 3+)
  • CMake (required for some builds)
  • Git for version control and easy forks
  • OpenMPI or MPICH for parallelization

A note about Python

  • The default user setup doesn't allow installing packages via pip like you could on your own computer
  • I've gotten around this by using virtualenv, but other tools work, too

Mathematical Libraries

Recommendation: GSL

  • There comes a time when NumPy/SciPy can't do what you need
  • GSL is a C library (with Python bindings) that can do:
    • Special function evaluation (example: Bessel functions)
    • Linear algebra
    • Numerical integration and differentiation
    • Root-finding in one or more dimensions
    • ... and much more!
  • GSL documentation

GSL C interface


#include <stdio.h>
#include <gsl/gsl_sf_bessel.h>

int main(void)
{
    double x = 5.0;
    double expected = -0.17759677131433830434739701;

    double y = gsl_sf_bessel_J0(x);

    printf("J0(5.0) = %.18f\n", y);
    printf("exact   = %.18f\n", expected);

    return 0;
}
                        

Recommendation: FFTW3

  • Well-known but still worth mentioning
  • Specialized FFT software (likely better than GSL)
  • Uses “wisdom” to perform many FFTs that have similar properties in an efficient way
  • FFTW3 documentation

Recommendation: PETSc

  • Complete, parallel C package built around the concept of sparse matrices
  • Divided into multiple submodules:
    • PC: preconditioners
    • KSP: Krylov linear solvers
    • SNES: nonlinear solvers
    • TS: ODE timesteppers
  • Change solvers on the fly using command line options
  • PETSc documentation

Recommendation: SUNDIALS

  • SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers
  • Can solve normal ODEs as well as DAEs
  • Multiple modules for different use cases
  • SUNDIALS documentation

SUNDIALS CVODE module

  • Probably the one you want for solving ODEs
  • Two options: Stiff (BDF) or non-stiff (Adams) methods
  • Automagically detects the optimal step-size and adapts as necessary — no need for user guesses
  • Easily interfaces with external linear solvers

Recommendation: FEniCS

  • Powerful PDE solver built on PETSc, Trilinos, Eigen, and others
  • Requires some knowledge of finite elements (need to cast your problem in variational form)
  • FEniCS documentation

FEniCS interface


# Define function space
P2 = VectorElement('P', tetrahedron, 2)
P1 = FiniteElement('P', tetrahedron, 1)
TH = P2 * P1
W = FunctionSpace(mesh, TH)

# Define variational problem
(u, p) = TrialFunctions(W)
(v, q) = TestFunctions(W)
a = inner(grad(u), grad(v))*dx - p*div(v)*dx + div(u)*q*dx
L = dot(f, v)*dx

# Compute solution
w = Function(W)
solve(a == L, w, [bc1, bc0])
                        

Data storage formats

How to store your groundbreaking new data?

  • Sometimes a CSV or text file is fine!
  • But what about...
    • Data on an irregular mesh?
    • Complex data structures?
  • NumPy zip files can store multiple datasets, but they can only be read by other NumPy users!

Recommendation: VTK or SILO

  • Good for 2- and 3-dimensional datasets
  • Supports structured and unstructured meshes
  • Values defined on points or cells of a mesh
  • VTK examples and SILO examples

Example: SILO mesh

Example: SILO data

VTK the hard way


#include <vtkSmartPointer.h>
#include <vtkXMLUnstructuredGridReader.h>
#include <vtkDataSetMapper.h>
#include <vtkActor.h>
#include <vtkRenderWindow.h>
#include <vtkRenderer.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkGL2PSExporter.h>
#include <iostream>

int main(int argc, char *argv[])
{
    vtkSmartPointer<vtkXMLUnstructuredGridReader> reader =
        vtkSmartPointer<vtkXMLUnstructuredGridReader>::New();
    reader->SetFileName(argv[1]);
    reader->Update();

    vtkSmartPointer<vtkDataSetMapper> mapper =
        vtkSmartPointer<vtkDataSetMapper>::New();
    mapper->SetInputConnection(reader->GetOutputPort());

    vtkSmartPointer<vtkActor> actor =
        vtkSmartPointer<vtkActor>::New();
    actor->SetMapper(mapper);

    vtkSmartPointer<vtkRenderer> renderer =
        vtkSmartPointer<vtkRenderer>::New();

    vtkSmartPointer<vtkRenderWindow> window =
        vtkSmartPointer<vtkRenderWindow>::New();
    window->AddRenderer(renderer);

    vtkSmartPointer<vtkRenderWindowInteractor> interactor =
        vtkSmartPointer<vtkRenderWindowInteractor>::New();
    interactor->SetRenderWindow(window);

    renderer->AddActor(actor);

    window->Render();
    interactor->Start();

    vtkSmartPointer<vtkGL2PSExporter> exporter =
        vtkSmartPointer<vtkGL2PSExporter>::New();
    exporter->SetInput(window);
    exporter->SetSortToBSP();
    exporter->CompressOff();
    exporter->SetFilePrefix("hello");
    exporter->Write();

    return EXIT_SUCCESS;
}
                        

Recommendation: HDF5

  • Good for structured, tabular data
  • Self-descriptive format
  • Organized like a mini filesystem
  • HDF5 documentation

Managing HDF5 data

  • Full-featured Python/C/C++/etc. interfaces
  • Can also use command-line tools, like h5dump
  • In addition to datasets, can store “attributes” that further describe the data (example: parameter values)

HDF5 and Python


import h5py as h5

# Open the file for reading
data = h5.File('mydata.h5', 'r')

# Access an attribute
print data.attrs['foo']

# Access a dataset -- it's a NumPy array!
print data['bar'][:]
                        

Visualization and Presentation Tools

Making animations

matplotlib is the obvious go-to, but sometimes...

  • You want to point-and-click
  • Python may not handle your data format (example: VTK)
  • You want to “explore” your dataset
  • You're in a hurry!

Recommendation: VisIt

  • Published by Lawrence Livermore National Laboratory and has a healthy user base
  • Nice GUI interface and tons of options
  • Optimized for terascale data but still works well at small scales
  • Downside: Can't output vector graphics
  • VisIt download

Example animation from VisIt

Interactivity

  • Even journals are now pushing HTML content over PDF
  • Good interactive plots can make your website awesome!

Recommendation: D3.js

  • Runs in a browser, can host on your CfA webspace
  • With a little creativity, make it beautiful!
  • Particularly good for “custom” visualization types, but can still handle more traditional plots
  • D3.js documentation

D3.js examples

Making presentations

  • PowerPoint and Keynote: Good for point-and-click, but support is an issue
  • Beamer: Can be beautiful, but use only as directed — has been known to cause insanity
  • Google Slides: Getting warmer! But still requires the use of Google's interface
  • So what's left?

Recommendation: reveal.js

  • Host your slides on your CfA webspace, no server-side code needed
  • Lots of features and beautiful templates
  • Can be accessed from any web browser — no more compatibility issues!
  • Seamless presenter notes and timer view, just like PowerPoint
  • reveal.js documentation and repo
  • For those who don't like HTML coding, there is a collaborative web editor, slides.com

Example reveal.js snippet


<div class="reveal">
    <div class="slides">
        <section>
            <h1>Scientific Coding: Tools You Might Not Know About</h1>
            <h2>Ellen M. Price</h2>
            <h3>Pizza Ponder, 14 August 2018</h3>
        </section>
    </div>
</div>
                        

Even More Tools

A growing list...

  • Numba for high-performance Python
  • HoloViews for interactive Python plotting
  • Dask for scalable, parallel Python