February 13, 2009
This article was contributed by Nathan Willis
For simplicity's sake, I keep all of my general-purpose boxes running
the same Linux distribution. That minimizes conflicts when sharing
applications and data, but every substantial upgrade means downloading the
same packages multiple times — taking a toll on bandwidth. I used to use
apt-proxy to intelligently cache downloaded packages for all the machines
to share, but there are alternatives: apt-cacher, apt-cacher-ng, and
approx, as well as options available for RPM-based distributions. This
article will take a look at some of these tools.
The generic way
Since Apt and RPM use HTTP to move data, it is possible to speed up
multiple updates simply by using a caching Web proxy like Squid. A transparent
proxy sitting between your LAN clients and the Internet requires no changes
to the client machines; otherwise you must configure Apt and RPM to use the
proxy, just as you must configure your Web browser to redirect its
requests. In each case, a simple change in the appropriate configuration
file is all that is required: /etc/apt/apt.conf.d/70debconf or
/etc/rpmrc, for example.
Although straightforward, this technique has its drawbacks. First, a
Web proxy will not recognize that two copies of a package retrieved from
different URLs are identical, undermining the process for RPM-based
distributions like Fedora, where the Yum update tool incorporates built-in
mirroring.
Secondly, using the same cache for packages and all other HTTP traffic
risks overflowing the cache. Very large upgrades — such as changing
releases rather
than individual package updates — can fill up the cache used by the proxy,
and downloaded packages can get pushed out of the way by web traffic if
your LAN upgrade process takes too much time. It is better to keep software
updates and general web traffic separate.
Apt-proxy versus apt-cacher
The grand-daddy of the Apt caching proxies is apt-proxy. The current
revision is written in Python and uses the Twisted framework. Complaints about
apt-proxy's speed, memory usage, and stability spawned the creation of apt-cacher, a
Perl-and-cURL based replacement that can
run either as a stand-alone daemon or as a CGI script on a web server.
Both operate by running as a service and accepting
incoming Apt connections from client machines on a high-numbered TCP port:
9999 for apt-proxy, 3142 for apt-cacher.
Apt-proxy is configured in the file
/etc/apt-proxy/apt-proxy-v2.conf.
In this file, one sets up a section for each Apt repository that will
be accessed by any of the machines using the proxy service. The syntax
requires assigning a unique alias to each section along with listing one or
more URLs for each repository. On each client machine, one must change the
repository information in /etc/apt/sources.list, altering each line to
point to the apt-proxy server and the appropriate section alias that was
assigned in /etc/apt-proxy/apt-proxy-v2.conf.
For example, consider an apt-proxy server running on 192.168.1.100. If
the original repository line in a client's sources.list is:
deb http://archive.ubuntu.com/ubuntu/ intrepid main
It would instead need
to read:
deb http://192.168.1.100:9999/ubuntubackend intrepid main
The new URL points to the apt-proxy server on 192.168.1.100, port 9999,
and to the section configured with the alias ubuntubackend.
The apt-proxy-v2.conf file would contain an entry such as:
[ubuntubackend]
backends = http://archive.ubuntu.com/ubuntu/
If you find that syntax confusing, you are not alone. Apt-proxy
requires detailed configuration on both the server and client sides: it
forces you to invent aliases for all existing repositories, and to edit
every repository line in every client's sources.list.
Apt-cacher is notably simpler in its configuration. Although there are
a swath of options available in apt-cacher's server configuration file
/etc/apt-cacher/apt-cacher.conf, the server does not
need to know about all of the upstream Apt repositories that clients
will access. Configuring the clients is enough to establish a working
proxy. On the client side, there are two options: either rewrite
the URLs of the repositories in each client's sources.list, or activate
Apt's existing proxying in /etc/apt/apt.conf. But
choose one or the other; you cannot do both.
To rewrite entries in sources.list, one merely prepends the address of
the apt-cacher server to the URL. So
deb http://archive.ubuntu.com/ubuntu/ intrepid main
becomes:
deb http://192.168.1.100:3142/archive.ubuntu.com/ubuntu/ intrepid main
Alternatively, leave the sources.list untouched, and edit apt.conf,
inserting the line:
Acquire::http::Proxy "http://192.168.1.100:3142/";
Ease of configuration aside, the two tools are approximately equal under
basic LAN conditions. Apt-cacher does offer more options for advanced
usage, including restricting access to specific hosts, logging,
rate-limiting, and cache maintenance. Both tools allow importing
existing packages from a local Apt cache into the cache shared by all
machines.
Much of the criticism of the tools observed on mailing lists or web
forums revolves around failure modes, for example whether Twisted or cURL
is more reliable as a network layer. But there are telling discussions
from experienced users of both that highlight differences you would rather
not experience firsthand.
For example, this
discussion includes a description of how apt-proxy's simplistic cache
maintenance can lose a cached package: If two clients download different
versions of the same package, the earlier downloads will expire
from the cache because apt-proxy does not realize that keeping both
versions is desirable. If you routinely test unstable packages on one but
not all of your boxes, such a scenario could bite you.
Other tools for Apt
Although apt-proxy and apt-cacher get the most attention, they are not
the only options.
Approx is intended as a
replacement for apt-proxy, written in Objective Caml and placing an
emphasis on simplicity. Like apt-proxy, client-side configuration involves
rewriting the repositories in sources.list. The server side
configuration
is simpler, however. Each repository is re-mapped to a single alias, with
one entry per line.
Apt-cacher-ng is
designed to serve as a drop-in replacement for apt-cacher, with the added
benefits of multi-threading and HTTP pipelining lending it better speed.
The server runs on the same TCP port, 3142, so transitioning from
apt-cacher to apt-cacher-ng requires no changes on the client side. The
server-side configuration is different, in that the
configuration can be split into multiple external files and incorporate
complicated
remapping rules.
Apt-cacher-ng does not presently provide manpage documentation,
supplying instead a 14-page PDF. Command-line fans may find that
disconcerting. Neither application has supplanted the original utility it
was designed to replace, but both are relatively recent projects. If
apt-proxy or apt-cacher don't do the job for you, perhaps approx or
apt-cacher-ng will.
Tools for RPM
The situation for RPM users is less rosy. Of course, as any packaging
maven will tell you, RPM and Apt are not proper equivalents. Apt is the
high-level tool for managing Debian packages with dpkg. A proper analog on
RPM-based systems would be Yum. Unfortunately, the Yum universe does not
yet have dedicated caching
proxy packages like those prevalent for Apt. It is not because no one
is interested; searching for the appropriate terms digs up threads at Linux
Users' Group mailing lists, distribution web forums, and general purpose
Linux help sites.
One can, of course, use Apt to manage an RPM-based system, but in most
cases the RPM-based distributions assume that you will use some other tool
designed for RPM from the ground up. In such a case, configuring Apt is
likely to be a task left to the individual user, as opposed to a
pre-configured Yum setup.
Most of the proposed workarounds for Yum involve some variation of the
general-purpose HTTP proxy solution described above, using Squid or http-replicator.
If you take this road, it is possible to avoid some of the pitfalls of
lumping RPM and general web traffic into one cache by using the HTTP proxy
only for package updates. Just make sure that plenty
of space has been allocated for the cache.
Alternatively, setting up a local mirror of the entire remote
repository, either with a tool such as mrepo, or piecemeal
is possible.
The local repository can then serve all of the clients on the LAN. Note,
however, that this method will maintain a mirror of the entire remote
repository, not just the packages that you download, and that you will have
to update the machine hosting the mirror itself in the old-fashioned
manner.
Finally, for the daring, one other interesting
discussion proposes faking a caching proxy by configuring each
machine to use the same Yum cache, shared via NFS. Caveat emptor.
I ultimately went with apt-cacher for this round of upgrades, on the
basis of its simpler configuration and its widespread deployment elsewhere.
Thus far, I have no complaints; the initial update went smoothly — Ubuntu
boxes moving from 8.04 to 8.10, for the curious. The machines are now all
in sync; time will tell whether or not additional package updates will
reveal additional problems in the coming months. It's a good thing there
are alternatives.
(
Log in to post comments)