TensorFlow

By Nathan Willis
December 9, 2015

Google released a machine-learning system named TensorFlow on November 9, noting that its own research team has been using the project to explore "deep-learning" models and that the code can also be used to deploy real-world applications. The code itself includes a Python library for creating, training, and testing machine-learning networks, plus an execution framework written in C++ that allows users to run their networks on a range of devices, from mobile phone processors up to multi-GPU servers.

The announcement points out that TensorFlow is actually Google's second machine-learning system, designed to be more general and flexible than its predecessor, DistBelief (which was not released as an open-source project). Specifically, while DistBelief supported only neural networks, TensorFlow can be used to set up nearly any type of graph-based computation. That accounts for the name; a "tensor" is merely a multi-dimensional array and TensorFlow's general approach focuses on constructing a directed graph. Tensors hold massive quantities of input or classification data; they get operated on and passed between nodes in the graph—potentially many times.

In machine learning, tensors can get very large, indeed. This is particularly true where vision- or language-recognition are the goal, and those are the problem classes that Google says TensorFlow is best suited for. The "hello world" example in the TensorFlow documentation involves recognizing the hand-written numerals zero to nine; the tensors used include a 55,000-by-784 array just for the training samples (which are just 28x28 grayscale images).

Consequently, TensorFlow separates defining the graph from running it on actual data. The graph is modeled using the Python API, then it is instantiated and run in the speed-optimized C++ core. Underneath, TensorFlow relies on NumPy to efficiently manipulate the large tensor structures; machine-learning graphs typically take repeated samples from tensors, compute statistics on multiple slices and subsets of tensors, and re-evaluate statistics on tensors in loops. That means a lot of operations needing to be optimized.

Currently, the TensorFlow Python package can be installed through pip, optionally in a Python virtualenv, or via a Docker image. GPU support is currently limited to Linux and requires the installation of NVIDIA's proprietary CUDA toolkit.

The getting-started tutorial provides this simple example of defining a problem in TensorFlow:

    import tensorflow as tf
    import numpy as np

    x_data = np.random.rand(100).astype("float32")
    y_data = x_data * 0.1 + 0.3

    W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
    b = tf.Variable(tf.zeros([1]))
    y = W * x_data + b

This example generates a set of (x,y) points on the line y = Wx + b then creates a data-flow graph that tries to discover the value of W and b by doing a least square regression:

    # Minimize the mean squared errors.
    loss = tf.reduce_mean(tf.square(y - y_data))
    optimizer = tf.train.GradientDescentOptimizer(0.5)
    train = optimizer.minimize(loss)

    # Before starting, initialize the variables.  We will 'run' this first.
    init = tf.initialize_all_variables()

    # Launch the graph.
    sess = tf.Session()
    sess.run(init)

    # Fit the line.
    for step in xrange(201):
        sess.run(train)
        if step % 20 == 0:
            print step, sess.run(W), sess.run(b)

Of primary interest here is that the TensorFlow variables (tf.Variable()) and graph operations (tf.reduce_mean(), tf.train.GradientDescentOptimizer(), and optimizer.minimize()) are all created first, after which a tf.Session is opened. The actual training of the graph does not begin until the sess.run(train) line.

As a rule, TensorFlow expects the user to set up their data-flow graph so that its output is an error calculation of some sort. The graph can then be "trained" by feeding it data and iteratively refining parameters until the error calculation is minimized. The GradientDescentOptimizer() method is one of TensorFlow's many available optimizer functions. Each offers a different approach to iteratively refining some calculation; here, the code uses gradient descent to minimize the difference-of-squares. TensorFlow's other optimizers include AdaGrad, Momentum, and follow-the-regularized-leader [PDF] (FTRL). Which optimizer is the best fit for any given problem is where the real machine-learning experts get involved; TensorFlow merely provides the tools to do the job.

In addition to the optimization methods, there are methods for creating a neural network graph, which provides functions that emulate neurons and signal processing. General machine learning does not require neural networks, but they are one of the most recognizable "artificial intelligence" approaches. That said, TensorFlow does provide a sizable library of functions that could be used in numerical computation that does not involve deep learning; they include matrix operations, functions to reduce the dimensionality of tensors, functions to compare or index subsets of tensors, and functions for image processing. There is even a special set of functions for operating on sparse tensors, duplicating the API for general tensors but (hopefully) consuming far fewer computational resources.

TensorFlow provides a syntax for specifying the "compute device" on which an operation is run. Specifying

    with tf.device('/cpu:0'):
      # graph definition

designates the system's CPU as the target processor for the graph. The other option is /gpu:0 through /gpu:n, which designate the system's GPUs as the targets (assuming that the optional CUDA support was enabled, that is). Although manually designating operations to run on specific processors is acceptable, TensorFlow does perform some optimization on the user's behalf even when it is not specified in the graph. Many matrix operations, for example, will be automatically run on the GPU and not the CPU if CUDA support is available. Large graphs can also set up queues to further parallelize computation; there is a full queue-management API.

In fact, the only real limitation to TensorFlow's parallel-programming support is that the library is currently limited to a single machine. For dealing with mere tens-of-thousands of rows in a tensor, that may not seem like a significant shortcoming, but for cutting-edge research it appears to be a point of controversy. Machine-learning researcher student Matt Mayo contended in his review of TensorFlow that the lack of distributed-computation support makes the release rather blasé; little different from other open-source machine-learning libraries like Theano.

The other issue is that, as of now, TensorFlow only has Python and C++ interfaces. The project site says it hopes the community will contribute interfaces for other languages. As anyone with experience in releasing large open-source projects will tell you, though, such hopes are easy to express yet do not automatically spawn many contributions.

The project is new enough that the research community needs to spend some time investigating it before a real consensus can be found. The advertised premise of TensorFlow is that Google uses it as a research tool and in production; there is no need to rewrite a trained graph before it is deployed.

For those new to the field of machine learning (which certainly includes me), TensorFlow represents a well-organized library with which it is possible to get started tackling machine-learning problems. The documentation is thorough and includes well-written tutorials. Whether TensorFlow takes off as an independent project, however, will depend far more on how the serious researchers and developers take to it than whether novices can get to the end of the tutorials.

A comment on Mayo's article claims that Google plans to have an update that supports distributed computing in mid-2016. By that time, we should have a clearer picture of how useful the machine-learning community finds TensorFlow at solving problems outside of Google.