|
|
Log in / Subscribe / Register

Plotting tools for Linux: matplotlib

February 4, 2015

This article was contributed by Lee Phillips

In our last installment, we introduced gnuplot, which is a standalone, language-agnostic plotting and analysis system. If you're using Python for data analysis, science, or statistics—or if you are considering it—you will want to know about matplotlib, a library that lets you produce publication-ready plots of all kinds, directly from your Python code. In contrast with the standalone gnuplot, matplotlib is a Python library, defining a large number of functions for you to use in your Python programs and from the interactive Python prompt.

Libraries and standalone programs each have their virtues and vices; which kind of solution you choose depends on your workflow, collaboration situation, plans, and inclinations. The tradeoffs are similar to choosing between a general-purpose database and a language-specific object storage system for persistent data. Standalone systems can be used from any language or a variety of languages at the same time, and they can be chosen on the basis of their performance and reliability—with little regard to the language in use.

A language-specific system can offer the programmer more fluency, flexibility, and control, and will generally require less training. When you become familiar with the matplotlib system, you will be able to use it to express the results of any Python program in graphical form.

Matplotlib, SciPy, NumPy, and all that

A swirl of names and packages has accumulated in the world of scientific Python that can be confusing to someone trying to navigate the territory for the first time. Let’s try to disentangle the terminology.

The first layer above Python itself is NumPy. This is really what makes science and data analysis with Python possible. NumPy enhances Python with an array type and extends the usual mathematical operators to act elementwise on arrays. The library also provides other linear algebra operations such as inner products, and higher-level functions including such things as Fourier transforms. It’s all implemented in C or Fortran, and is quite fast. NumPy has superseded the Numeric and numarray libraries.

Matplotlib is the extensive plotting library that is the subject of this article. The data to be plotted in matplotlib is provided in the form of NumPy arrays. The plotting functions also accept lists, but convert them to arrays.

PyLab is a module designed to make interactive computing with NumPy and matplotlib more convenient. We'll have more to say about this one below.

Pyplot is a submodule of matplotlib that provides one of its two major interfaces; there will be plenty of pyplot later on.

SciPy is a large collection of packages (the project itself calls it an "ecosystem") for doing science and mathematics with Python. SciPy bundles together all the libraries mentioned in this section and adds packages for symbolic math, data analytics, and numerical computing (special functions, statistics, integration, and much more).

Everything mentioned in this section is free and open source, widely used, and under active development.

Installation

There are a surprisingly large number of ways to install matplotlib. Of course, you need Python, as well as NumPy, and you probably want the advanced interactive shell IPython, which makes working with Python in general—and matplotlib in particular—far more convenient than the standard read–eval–print loop (REPL).

Python is already present in most Linux distributions, and there is a good chance you already have NumPy as well. If you don't, it will most likely be available from your package-management system. If not, or if the version there is too old, you can download and compile the sources. Matplotlib supported only Python 2 until the end of 2012, but as of version 1.2 supports Python 3 as well.

The same strategies apply to IPython and matplotlib itself. Both of these can also be installed with pip (a Python package management utility).

Alternatively, you can get everything you need by installing SciPy. Yet another route is to download a precompiled bundle of packages put together by a commercial entity and provided free of charge; at the moment there are several of these (the most well-established being Enthought). The obvious disadvantages of downloading one of these large environments are, first, that you will likely be installing plenty of stuff that you don't need and, second, that you have no control over the versions of the various components. But this approach can be very convenient.

As a case study, I offer my own installation strategy, on my laptop running Ubuntu 12.04: I have Python, IPython, and NumPy from the standard Ubuntu repositories. The matplotlib version available from those repositories was too old (meaning merely that I wanted some recently added features), so I downloaded the tarball of the latest stable version (1.4.2). It would not compile, because my version of pyparsing was too old, but a more recent version was available through pip. With this in place, a simple python setup.py install gave me the version of matplotlib used for all the examples in this article.

Using matplotlib

In order to create a simple plot of the sine and cosine functions, as shown here, we can run the Python program below:

[Sine and cosine]
    import numpy as np
    from numpy import pi
    import matplotlib.pyplot as plt

    x = np.arange(-pi, pi, pi/1000)
    y1 = np.sin(x)
    y2 = np.cos(x)
    plt.title("Circular Functions")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.plot(x, y1)
    plt.plot(x, y2)
    plt.grid(True)
    plt.show()

The first three lines import NumPy (which is always required), a math constant from NumPy, and the pyplot matplotlib submodule. Then, x is set up as a NumPy array by specifying its start, end, and interval. The lines defining the y1 and y2 arrays make use of the elementwise action of the functions from the NumPy library: there is no loop, map, or list comprehension required. The three lines after that cause the title and axis labels to appear on the graph. Finally, we turn on the gridlines and cause the graph to be displayed in its own window.

The style of programming shown here is neither functional nor object-oriented. The use of the pyplot functions manipulate an implicit global state that defines attributes of the current graph, which comes into existence with the first call to a pyplot function. If this script is saved on disk and run with the Python interpreter, a window will pop up containing the plot and some interactive controls. This is a side-effect of the final line. The plot window will remain open until the program is killed. Its controls allow you to perform several kinds of adjustments to the graph and to save a permanent copy in any of several formats.

The reason that pyplot employs this unorthodox programming style is to provide a convenient interface for interactive exploration, as well as offering a familiar environment for people coming from MATLAB.

If you are using matplotlib to explore, you will benefit from using IPython, which provides a special mode designed to work smoothly with pyplot. To invoke this interface, put IPython into PyLab mode, either by starting it up with ipython --pylab or starting it normally and entering the magic %pylab command at the prompt.

In PyLab mode, the first pyplot command puts up a plot window, but the interpreter remains unblocked, so you can continue to enter commands. Each pyplot command then updates the plot immediately. In addition, pyplot and NumPy are loaded into the namespace, so you can type things like title("The title") without a prefix. The overall interactive experience is similar to working at the gnuplot prompt, but with the entire Python language available, and without having to ever type replot.

To turn this script into one that saves the plot in a file, simply change the last line to indicate the path and format you want with an extension. For example, plt.savefig('figure.png') saves the plot in a PNG file.

There are several approaches to altering the details and appearance of the plot. For global changes in style, you can use matplotlib's stylesheet support: inserting the command plt.style.use('dark_background') does just what it sounds like, for example. There is even an option available to make xkcd-like, pen-and-ink–style graphs. For more detailed control, you can add keywords to some of the plot commands. For example, if we replaced the second plot command in our script above with plt.plot(x, y2, linewidth=2.0), the cosine curve would be plotted with a thicker line.

For even more detailed control, matplotlib supplies an object-oriented interface that allows every property of every element of the graph to be controlled (using getters and setters). This is where you must turn if you have a problem that can not be solved through the pyplot interface, or if you're interested in creating a specialized and nonstandard type of plot. We don't have enough space to delve into this layer of matplotlib, but the documentation is pretty good. Nevertheless, we do need to explore the edges a bit in order to make 3D plots.

The third dimension

Matplotlib has excellent support for surface, image, contour, vector, and other kinds of plots beyond 2D curves. Here is a minimal example that creates a surface plot:

    from mpl_toolkits.mplot3d import Axes3D
    from matplotlib import cm
    import matplotlib.pyplot as plt
    import numpy as np
    from numpy import pi, sin, exp

    pr = 20.
    yend = 4.0
    x = np.arange(0, pi, pi/pr)
    y = np.arange(0, yend, yend/pr)
    x, y = np.meshgrid(x, y)
    z = exp(-y)*sin(2*x)
    ax = plt.figure().gca(projection='3d')
    ax.set_xlim(0, pi)
    ax.set_ylim(0, yend)
    surf = ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap=cm.coolwarm)
    plt.show()

There are a couple of additional imports here. Axes3D is a separate module, distributed with matplotlib, that handles perspective plotting of meshes and surfaces. The cm module defines color palettes; we're going to pick one to color our surface.

Next, we establish our NumPy coordinate arrays as before. We take advantage of the convenient NumPy meshgrid function, which takes two 1D coordinate arrays and expands them to create a 2D coordinate matrix. z is the value that defines the surface, again defined with the help of NumPy's elementwise math.

[3D example]

The line after the calculation of z sets ax to be an axes object. This is the matplotlib object that directly contains most of the plot elements, defines the coordinate system, and that you most often deal with when using matplotlib's object-oriented interface. In the following two lines, we set the axis limits by using the setter interface. The final lines plot the surface in an interactive window as before.

Now the controls allow you to rotate the 3D plot around two axes. The rstride and cstride arguments determine how often the matrix of values is sampled to make the plot (r and c stand for row and column) and the cmap argument defines the palette for translating z values to colors. The image to the left is what we get.

LaTeX support

While matplotlib's involvement with LaTeX is not as intimate as gnuplot's, there is still excellent support for using TeX syntax and layout to typeset graph labels and some support for including graphs in documents.

The TeX typesetting algorithms have been reimplemented in matplotlib, which also ships with a collection of TeX fonts. In order to use LaTeX syntax to place text on the graph, just use the same calls as usual, and place the LaTeX between dollar signs. You will need to use raw strings for the labels, denoted in Python with an r before the string, to avoid having all of LaTex's backslashes interpreted as escape sequences:

    import numpy as np
    from numpy import pi, sin, exp
    import matplotlib.pyplot as plt

    x = np.arange(0.03, 1, 0.0001)
    y = sin(1/x) * exp(-x)
    plt.xlabel(r"$x$", fontsize = 18)
    plt.ylabel(r"$\mathcal{F}$", fontsize = 18)
    plt.text(0.25, 0.75, r"$\mathcal{F} = e^{-x}\sin(\frac{1}{x})$", fontsize = 24)
    plt.plot(x,y)
    plt.show()

This gives us the following:

[LaTeX example]

Notice the standard LaTeX syntax leading to typeset results that look just like real LaTeX. If matplotlib's implementation of LaTeX is not sufficient for your application, there is an option to call out to LaTeX—but, of course, this means you need to have LaTeX installed.

PGF is a macro package included with most large LaTeX installations. It is usually used in conjunction with its syntax layer TikZ, together providing a domain-specific language for generating all kinds of diagrams directly from LaTeX. Matplotlib can save graphs in PGF format: doing so will create a file full of PGF commands rather than an image file.

Let's try to reproduce our example from the previous article in this series, where we used a special gnuplot terminal to include a graph in a LaTeX document. First, here is the Python script that creates the graph file:

    import numpy as np
    from numpy import exp, sqrt, pi
    import matplotlib.pyplot as plt

    x = np.arange(-4, 4, 0.001)
    s = 1
    y1 = exp(-x**2/(2*s**2))/(s*sqrt(2*pi))
    s = 2
    y2 = exp(-x**2/(2*s**2))/(s*sqrt(2*pi))
    plt.text(-3.5, .34, r'$\frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}}$', fontsize = 30)
    plt.text(0.95, .3, r'$\sigma = 2$', fontsize = 24)
    plt.text(2.7, .1, r'$\sigma = 2$', fontsize = 24)
    plt.plot(x, y1)
    plt.plot(x, y2)
    plt.savefig('normal.pgf')

This should all be familiar by now; the only thing new is the pgf file extension. After running this, the file normal.pgf will be an ASCII file full of PGF macro commands, suitable for inclusion in a LaTeX file (using input):

    \documentclass{article}
    \usepackage{graphicx}
    \usepackage[rflt]{floatflt}
    \usepackage{pgf}
    \pagestyle{empty}
    \begin{document}
    \begin{floatingfigure}{2.9in}
    \resizebox{2.5in}{!}{\input{normal.pgf}}
    \end{floatingfigure}
    \noindent The figure on the right illustrates the normal, or Gaussian distribution,
    \[ {\cal N}(x; \sigma) = \frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}} \]
    plotted for two values of the standard deviation, $\sigma$.
    According to the central limit theorem, certain other probability distributions, 
    including the binomial distribution, tend
    to the normal distribution in the limit of a large ``sample size.''
    \end{document}

If you process this with LaTeX, the output should look like this:

[Pgf example]

Conclusion

If you're committed to Python for data science or for post-processing the results of experiments or simulations, matplotlib is an obvious choice. It has become a completely mature solution for any kind of publication-quality technical graphics. We could only scratch the surface here, but matplotlib makes it easy to produce not only all the usual scientific plots, but such things as bar and pie charts as well.

While matplotlib has grown into a fairly huge and complex set of libraries, the documentation is pretty good and its wide adoption means help is easy to come by. Finally, it's easy to start with simple things and learn more as the need arises.


Index entries for this article
GuestArticlesPhillips, Lee


to post comments

Plotting tools for Linux: matplotlib

Posted Feb 5, 2015 2:31 UTC (Thu) by garrison (subscriber, #39220) [Link] (1 responses)

In fact, matplotlib is not just for Python programmers. I recently switched from Python to Julia for all my scientific programming, and matplotlib is one of the few Python libraries in my workflow that came along for the ride. The PyPlot.jl package provides a very nice interface to matplotlib from Julia.

Plotting tools for Linux: matplotlib

Posted Feb 5, 2015 13:08 UTC (Thu) by leephillips (subscriber, #100450) [Link]

Julia is exceptional in the ease with which you can use Python (as well as Fortran and C) libraries. And it's a wonderful language.

Plotting tools for Linux: matplotlib

Posted Feb 5, 2015 3:02 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

I tried the examples - cut/paste into a file and run "python file.py" - and ... nothing happened.

After some exploration, running

    echo 'backend : GTKCairo' > matplotlibrc
made it show the result. Maybe this is distro-specific.

Thanks!

Plotting tools for Linux: matplotlib

Posted Feb 5, 2015 13:05 UTC (Thu) by leephillips (subscriber, #100450) [Link]

I think it probably is. Thank you for this comment - it is sure to help a lot of people.

Plotting tools for Linux: matplotlib

Posted Feb 6, 2015 18:52 UTC (Fri) by jsanders (subscriber, #69784) [Link]

Can I plug my Veusz plotting package: http://home.gna.org/veusz/ ? It's based on Python, but you get a powerful Qt GUI combined with a scriptable plotting interface. You can even add plugins for importing data, data manipulation or scripting.

matplotlib in the browser

Posted Feb 6, 2015 21:45 UTC (Fri) by debacle (subscriber, #7114) [Link] (1 responses)

Combining matplotlib with D3.js, mpld3 brings many (but not all) matplotlib graphs to the web browser.

matplotlib in the browser

Posted Feb 7, 2015 0:44 UTC (Sat) by leephillips (subscriber, #100450) [Link]

That is wonderful stuff. Very powerful and convenient way to make interactive plots for the web.

not just for python projects

Posted Feb 10, 2015 16:23 UTC (Tue) by droundy (guest, #4559) [Link]

The author makes a false distinction between "language-agnostic" gnuplot and the Python library matplotlib. Both tools force you to use a specific language, it's just that in gnuplot's case you are forces to use a custom language, while in matplotlib you are forced to use Python. My research group has switched entirely to matplotlib for plotting, but do little else in Python. The trouble with gnuplot is that while its language is powerful, sometimes it is no good enough, and then you have to write one program to create another, which is a pain, and kills the readability of your code.


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds