A look at package repository proxies
For simplicity's sake, I keep all of my general-purpose boxes running the same Linux distribution. That minimizes conflicts when sharing applications and data, but every substantial upgrade means downloading the same packages multiple times — taking a toll on bandwidth. I used to use apt-proxy to intelligently cache downloaded packages for all the machines to share, but there are alternatives: apt-cacher, apt-cacher-ng, and approx, as well as options available for RPM-based distributions. This article will take a look at some of these tools.
The generic way
Since Apt and RPM use HTTP to move data, it is possible to speed up
multiple updates simply by using a caching Web proxy like Squid. A transparent
proxy sitting between your LAN clients and the Internet requires no changes
to the client machines; otherwise you must configure Apt and RPM to use the
proxy, just as you must configure your Web browser to redirect its
requests. In each case, a simple change in the appropriate configuration
file is all that is required: /etc/apt/apt.conf.d/70debconf or
/etc/rpmrc, for example.
Although straightforward, this technique has its drawbacks. First, a Web proxy will not recognize that two copies of a package retrieved from different URLs are identical, undermining the process for RPM-based distributions like Fedora, where the Yum update tool incorporates built-in mirroring.
Secondly, using the same cache for packages and all other HTTP traffic risks overflowing the cache. Very large upgrades — such as changing releases rather than individual package updates — can fill up the cache used by the proxy, and downloaded packages can get pushed out of the way by web traffic if your LAN upgrade process takes too much time. It is better to keep software updates and general web traffic separate.
Apt-proxy versus apt-cacher
The grand-daddy of the Apt caching proxies is apt-proxy. The current revision is written in Python and uses the Twisted framework. Complaints about apt-proxy's speed, memory usage, and stability spawned the creation of apt-cacher, a Perl-and-cURL based replacement that can run either as a stand-alone daemon or as a CGI script on a web server. Both operate by running as a service and accepting incoming Apt connections from client machines on a high-numbered TCP port: 9999 for apt-proxy, 3142 for apt-cacher.
Apt-proxy is configured in the file /etc/apt-proxy/apt-proxy-v2.conf. In this file, one sets up a section for each Apt repository that will be accessed by any of the machines using the proxy service. The syntax requires assigning a unique alias to each section along with listing one or more URLs for each repository. On each client machine, one must change the repository information in /etc/apt/sources.list, altering each line to point to the apt-proxy server and the appropriate section alias that was assigned in /etc/apt-proxy/apt-proxy-v2.conf.
For example, consider an apt-proxy server running on 192.168.1.100. If the original repository line in a client's sources.list is:
deb http://archive.ubuntu.com/ubuntu/ intrepid main
It would instead need to read:
deb http://192.168.1.100:9999/ubuntubackend intrepid main
The new URL points to the apt-proxy server on 192.168.1.100, port 9999,
and to the section configured with the alias ubuntubackend.
The apt-proxy-v2.conf file would contain an entry such as:
[ubuntubackend]
backends = http://archive.ubuntu.com/ubuntu/
If you find that syntax confusing, you are not alone. Apt-proxy requires detailed configuration on both the server and client sides: it forces you to invent aliases for all existing repositories, and to edit every repository line in every client's sources.list.
Apt-cacher is notably simpler in its configuration. Although there are
a swath of options available in apt-cacher's server configuration file
/etc/apt-cacher/apt-cacher.conf, the server does not
need to know about all of the upstream Apt repositories that clients
will access. Configuring the clients is enough to establish a working
proxy. On the client side, there are two options: either rewrite
the URLs of the repositories in each client's sources.list, or activate
Apt's existing proxying in /etc/apt/apt.conf. But
choose one or the other; you cannot do both.
To rewrite entries in sources.list, one merely prepends the address of the apt-cacher server to the URL. So
deb http://archive.ubuntu.com/ubuntu/ intrepid main
becomes:
deb http://192.168.1.100:3142/archive.ubuntu.com/ubuntu/ intrepid main
Alternatively, leave the sources.list untouched, and edit apt.conf, inserting the line:
Acquire::http::Proxy "http://192.168.1.100:3142/";
Ease of configuration aside, the two tools are approximately equal under basic LAN conditions. Apt-cacher does offer more options for advanced usage, including restricting access to specific hosts, logging, rate-limiting, and cache maintenance. Both tools allow importing existing packages from a local Apt cache into the cache shared by all machines.
Much of the criticism of the tools observed on mailing lists or web forums revolves around failure modes, for example whether Twisted or cURL is more reliable as a network layer. But there are telling discussions from experienced users of both that highlight differences you would rather not experience firsthand.
For example, this discussion includes a description of how apt-proxy's simplistic cache maintenance can lose a cached package: If two clients download different versions of the same package, the earlier downloads will expire from the cache because apt-proxy does not realize that keeping both versions is desirable. If you routinely test unstable packages on one but not all of your boxes, such a scenario could bite you.
Other tools for Apt
Although apt-proxy and apt-cacher get the most attention, they are not the only options.
Approx is intended as a replacement for apt-proxy, written in Objective Caml and placing an emphasis on simplicity. Like apt-proxy, client-side configuration involves rewriting the repositories in sources.list. The server side configuration is simpler, however. Each repository is re-mapped to a single alias, with one entry per line.
Apt-cacher-ng is designed to serve as a drop-in replacement for apt-cacher, with the added benefits of multi-threading and HTTP pipelining lending it better speed. The server runs on the same TCP port, 3142, so transitioning from apt-cacher to apt-cacher-ng requires no changes on the client side. The server-side configuration is different, in that the configuration can be split into multiple external files and incorporate complicated remapping rules.
Apt-cacher-ng does not presently provide manpage documentation, supplying instead a 14-page PDF. Command-line fans may find that disconcerting. Neither application has supplanted the original utility it was designed to replace, but both are relatively recent projects. If apt-proxy or apt-cacher don't do the job for you, perhaps approx or apt-cacher-ng will.
Tools for RPM
The situation for RPM users is less rosy. Of course, as any packaging maven will tell you, RPM and Apt are not proper equivalents. Apt is the high-level tool for managing Debian packages with dpkg. A proper analog on RPM-based systems would be Yum. Unfortunately, the Yum universe does not yet have dedicated caching proxy packages like those prevalent for Apt. It is not because no one is interested; searching for the appropriate terms digs up threads at Linux Users' Group mailing lists, distribution web forums, and general purpose Linux help sites.
One can, of course, use Apt to manage an RPM-based system, but in most cases the RPM-based distributions assume that you will use some other tool designed for RPM from the ground up. In such a case, configuring Apt is likely to be a task left to the individual user, as opposed to a pre-configured Yum setup.
Most of the proposed workarounds for Yum involve some variation of the general-purpose HTTP proxy solution described above, using Squid or http-replicator. If you take this road, it is possible to avoid some of the pitfalls of lumping RPM and general web traffic into one cache by using the HTTP proxy only for package updates. Just make sure that plenty of space has been allocated for the cache.
Alternatively, setting up a local mirror of the entire remote repository, either with a tool such as mrepo, or piecemeal is possible. The local repository can then serve all of the clients on the LAN. Note, however, that this method will maintain a mirror of the entire remote repository, not just the packages that you download, and that you will have to update the machine hosting the mirror itself in the old-fashioned manner.
Finally, for the daring, one other interesting discussion proposes faking a caching proxy by configuring each machine to use the same Yum cache, shared via NFS. Caveat emptor.
I ultimately went with apt-cacher for this round of upgrades, on the basis of its simpler configuration and its widespread deployment elsewhere. Thus far, I have no complaints; the initial update went smoothly — Ubuntu boxes moving from 8.04 to 8.10, for the curious. The machines are now all in sync; time will tell whether or not additional package updates will reveal additional problems in the coming months. It's a good thing there are alternatives.
| Index entries for this article | |
|---|---|
| GuestArticles | Willis, Nathan |
