Plotting tools for Linux: matplotlib
In our last installment, we introduced gnuplot, which is a standalone, language-agnostic plotting and analysis system. If you're using Python for data analysis, science, or statistics—or if you are considering it—you will want to know about matplotlib, a library that lets you produce publication-ready plots of all kinds, directly from your Python code. In contrast with the standalone gnuplot, matplotlib is a Python library, defining a large number of functions for you to use in your Python programs and from the interactive Python prompt.
Libraries and standalone programs each have their virtues and vices; which kind of solution you choose depends on your workflow, collaboration situation, plans, and inclinations. The tradeoffs are similar to choosing between a general-purpose database and a language-specific object storage system for persistent data. Standalone systems can be used from any language or a variety of languages at the same time, and they can be chosen on the basis of their performance and reliability—with little regard to the language in use.
A language-specific system can offer the programmer more fluency, flexibility, and control, and will generally require less training. When you become familiar with the matplotlib system, you will be able to use it to express the results of any Python program in graphical form.
Matplotlib, SciPy, NumPy, and all that
A swirl of names and packages has accumulated in the world of scientific Python that can be confusing to someone trying to navigate the territory for the first time. Let’s try to disentangle the terminology.
The first layer above Python itself is NumPy. This is really what makes science and data analysis with Python possible. NumPy enhances Python with an array type and extends the usual mathematical operators to act elementwise on arrays. The library also provides other linear algebra operations such as inner products, and higher-level functions including such things as Fourier transforms. It’s all implemented in C or Fortran, and is quite fast. NumPy has superseded the Numeric and numarray libraries.
Matplotlib is the extensive plotting library that is the subject of this article. The data to be plotted in matplotlib is provided in the form of NumPy arrays. The plotting functions also accept lists, but convert them to arrays.
PyLab is a module designed to make interactive computing with NumPy and matplotlib more convenient. We'll have more to say about this one below.
Pyplot is a submodule of matplotlib that provides one of its two major interfaces; there will be plenty of pyplot later on.
SciPy is a large collection of packages (the project itself calls it an "ecosystem") for doing science and mathematics with Python. SciPy bundles together all the libraries mentioned in this section and adds packages for symbolic math, data analytics, and numerical computing (special functions, statistics, integration, and much more).
Everything mentioned in this section is free and open source, widely used, and under active development.
Installation
There are a surprisingly large number of ways to install matplotlib. Of course, you need Python, as well as NumPy, and you probably want the advanced interactive shell IPython, which makes working with Python in general—and matplotlib in particular—far more convenient than the standard read–eval–print loop (REPL).
Python is already present in most Linux distributions, and there is a good chance you already have NumPy as well. If you don't, it will most likely be available from your package-management system. If not, or if the version there is too old, you can download and compile the sources. Matplotlib supported only Python 2 until the end of 2012, but as of version 1.2 supports Python 3 as well.
The same strategies apply to IPython and matplotlib itself. Both of these can also be installed with pip (a Python package management utility).
Alternatively, you can get everything you need by installing SciPy. Yet another route is to download a precompiled bundle of packages put together by a commercial entity and provided free of charge; at the moment there are several of these (the most well-established being Enthought). The obvious disadvantages of downloading one of these large environments are, first, that you will likely be installing plenty of stuff that you don't need and, second, that you have no control over the versions of the various components. But this approach can be very convenient.
As a case study, I offer my own installation strategy, on my laptop running Ubuntu 12.04: I have Python, IPython, and NumPy from the standard Ubuntu repositories. The matplotlib version available from those repositories was too old (meaning merely that I wanted some recently added features), so I downloaded the tarball of the latest stable version (1.4.2). It would not compile, because my version of pyparsing was too old, but a more recent version was available through pip. With this in place, a simple python setup.py install gave me the version of matplotlib used for all the examples in this article.
Using matplotlib
In order to create a simple plot of the sine and cosine functions, as shown here, we can run the Python program below:
import numpy as np
from numpy import pi
import matplotlib.pyplot as plt
x = np.arange(-pi, pi, pi/1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.title("Circular Functions")
plt.xlabel("x")
plt.ylabel("y")
plt.plot(x, y1)
plt.plot(x, y2)
plt.grid(True)
plt.show()
The first three lines import NumPy (which is always required), a math constant from NumPy, and the pyplot matplotlib submodule. Then, x is set up as a NumPy array by specifying its start, end, and interval. The lines defining the y1 and y2 arrays make use of the elementwise action of the functions from the NumPy library: there is no loop, map, or list comprehension required. The three lines after that cause the title and axis labels to appear on the graph. Finally, we turn on the gridlines and cause the graph to be displayed in its own window.
The style of programming shown here is neither functional nor object-oriented. The use of the pyplot functions manipulate an implicit global state that defines attributes of the current graph, which comes into existence with the first call to a pyplot function. If this script is saved on disk and run with the Python interpreter, a window will pop up containing the plot and some interactive controls. This is a side-effect of the final line. The plot window will remain open until the program is killed. Its controls allow you to perform several kinds of adjustments to the graph and to save a permanent copy in any of several formats.
The reason that pyplot employs this unorthodox programming style is to provide a convenient interface for interactive exploration, as well as offering a familiar environment for people coming from MATLAB.
If you are using matplotlib to explore, you will benefit from using IPython, which provides a special mode designed to work smoothly with pyplot. To invoke this interface, put IPython into PyLab mode, either by starting it up with ipython --pylab or starting it normally and entering the magic %pylab command at the prompt.
In PyLab mode, the first pyplot command puts up a plot window, but the interpreter remains unblocked, so you can continue to enter commands. Each pyplot command then updates the plot immediately. In addition, pyplot and NumPy are loaded into the namespace, so you can type things like title("The title") without a prefix. The overall interactive experience is similar to working at the gnuplot prompt, but with the entire Python language available, and without having to ever type replot.
To turn this script into one that saves the plot in a file, simply change the last line to indicate the path and format you want with an extension. For example, plt.savefig('figure.png') saves the plot in a PNG file.
There are several approaches to altering the details and appearance of the plot. For global changes in style, you can use matplotlib's stylesheet support: inserting the command plt.style.use('dark_background') does just what it sounds like, for example. There is even an option available to make xkcd-like, pen-and-ink–style graphs. For more detailed control, you can add keywords to some of the plot commands. For example, if we replaced the second plot command in our script above with plt.plot(x, y2, linewidth=2.0), the cosine curve would be plotted with a thicker line.
For even more detailed control, matplotlib supplies an object-oriented interface that allows every property of every element of the graph to be controlled (using getters and setters). This is where you must turn if you have a problem that can not be solved through the pyplot interface, or if you're interested in creating a specialized and nonstandard type of plot. We don't have enough space to delve into this layer of matplotlib, but the documentation is pretty good. Nevertheless, we do need to explore the edges a bit in order to make 3D plots.
The third dimension
Matplotlib has excellent support for surface, image, contour, vector, and other kinds of plots beyond 2D curves. Here is a minimal example that creates a surface plot:
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
import numpy as np
from numpy import pi, sin, exp
pr = 20.
yend = 4.0
x = np.arange(0, pi, pi/pr)
y = np.arange(0, yend, yend/pr)
x, y = np.meshgrid(x, y)
z = exp(-y)*sin(2*x)
ax = plt.figure().gca(projection='3d')
ax.set_xlim(0, pi)
ax.set_ylim(0, yend)
surf = ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap=cm.coolwarm)
plt.show()
There are a couple of additional imports here. Axes3D is a separate module, distributed with matplotlib, that handles perspective plotting of meshes and surfaces. The cm module defines color palettes; we're going to pick one to color our surface.
Next, we establish our NumPy coordinate arrays as before. We take advantage of the convenient NumPy meshgrid function, which takes two 1D coordinate arrays and expands them to create a 2D coordinate matrix. z is the value that defines the surface, again defined with the help of NumPy's elementwise math.
The line after the calculation of z sets ax to be an axes object. This is the matplotlib object that directly contains most of the plot elements, defines the coordinate system, and that you most often deal with when using matplotlib's object-oriented interface. In the following two lines, we set the axis limits by using the setter interface. The final lines plot the surface in an interactive window as before.
Now the controls allow you to rotate the 3D plot around two axes. The rstride and cstride arguments determine how often the matrix of values is sampled to make the plot (r and c stand for row and column) and the cmap argument defines the palette for translating z values to colors. The image to the left is what we get.
LaTeX support
While matplotlib's involvement with LaTeX is not as intimate as gnuplot's, there is still excellent support for using TeX syntax and layout to typeset graph labels and some support for including graphs in documents.
The TeX typesetting algorithms have been reimplemented in matplotlib, which also ships with a collection of TeX fonts. In order to use LaTeX syntax to place text on the graph, just use the same calls as usual, and place the LaTeX between dollar signs. You will need to use raw strings for the labels, denoted in Python with an r before the string, to avoid having all of LaTex's backslashes interpreted as escape sequences:
import numpy as np
from numpy import pi, sin, exp
import matplotlib.pyplot as plt
x = np.arange(0.03, 1, 0.0001)
y = sin(1/x) * exp(-x)
plt.xlabel(r"$x$", fontsize = 18)
plt.ylabel(r"$\mathcal{F}$", fontsize = 18)
plt.text(0.25, 0.75, r"$\mathcal{F} = e^{-x}\sin(\frac{1}{x})$", fontsize = 24)
plt.plot(x,y)
plt.show()
This gives us the following:
Notice the standard LaTeX syntax leading to typeset results that look just like real LaTeX. If matplotlib's implementation of LaTeX is not sufficient for your application, there is an option to call out to LaTeX—but, of course, this means you need to have LaTeX installed.
PGF is a macro package included with most large LaTeX installations. It is usually used in conjunction with its syntax layer TikZ, together providing a domain-specific language for generating all kinds of diagrams directly from LaTeX. Matplotlib can save graphs in PGF format: doing so will create a file full of PGF commands rather than an image file.
Let's try to reproduce our example from the previous article in this series, where we used a special gnuplot terminal to include a graph in a LaTeX document. First, here is the Python script that creates the graph file:
import numpy as np
from numpy import exp, sqrt, pi
import matplotlib.pyplot as plt
x = np.arange(-4, 4, 0.001)
s = 1
y1 = exp(-x**2/(2*s**2))/(s*sqrt(2*pi))
s = 2
y2 = exp(-x**2/(2*s**2))/(s*sqrt(2*pi))
plt.text(-3.5, .34, r'$\frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}}$', fontsize = 30)
plt.text(0.95, .3, r'$\sigma = 2$', fontsize = 24)
plt.text(2.7, .1, r'$\sigma = 2$', fontsize = 24)
plt.plot(x, y1)
plt.plot(x, y2)
plt.savefig('normal.pgf')
This should all be familiar by now; the only thing new is the pgf file extension. After running this, the file normal.pgf will be an ASCII file full of PGF macro commands, suitable for inclusion in a LaTeX file (using input):
\documentclass{article}
\usepackage{graphicx}
\usepackage[rflt]{floatflt}
\usepackage{pgf}
\pagestyle{empty}
\begin{document}
\begin{floatingfigure}{2.9in}
\resizebox{2.5in}{!}{\input{normal.pgf}}
\end{floatingfigure}
\noindent The figure on the right illustrates the normal, or Gaussian distribution,
\[ {\cal N}(x; \sigma) = \frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}} \]
plotted for two values of the standard deviation, $\sigma$.
According to the central limit theorem, certain other probability distributions,
including the binomial distribution, tend
to the normal distribution in the limit of a large ``sample size.''
\end{document}
If you process this with LaTeX, the output should look like this:
Conclusion
If you're committed to Python for data science or for post-processing the results of experiments or simulations, matplotlib is an obvious choice. It has become a completely mature solution for any kind of publication-quality technical graphics. We could only scratch the surface here, but matplotlib makes it easy to produce not only all the usual scientific plots, but such things as bar and pie charts as well.
While matplotlib has grown into a fairly huge and complex set of libraries, the documentation is pretty good and its wide adoption means help is easy to come by. Finally, it's easy to start with simple things and learn more as the need arises.
| Index entries for this article | |
|---|---|
| GuestArticles | Phillips, Lee |
