Weekly Edition Return to the Development pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
How The Backup Process Has ChangedBacking up data stored on computers is one of the most important jobs of a systems administrator. A regular backup routine can save large amounts of heartache and frustration when a disk drive or system fails. Disk failure should be treated as something that is guaranteed to happen in the life (and death) of every disk drive. And, disk failures always seem to happen at the worst possible moment. Typical failures happen on Friday afternoon before a vacation is about to start or when the boss comes into your office demanding that critical report that lives on the machine with the smoke curling out of the power supply. Over the years, your author has lived through many backup technologies. In the early days of home computing and CP/M systems, floppy to floppy transfer was the only method to save data. Floppies were unreliable and multiple copies were important. When hard drives became normal hardware on DOS-based microprocessor systems, backups were performed on piles of floppy disks or short-lived tape technologies. It was a bad day when floppy disk 29 of a batch of 30 encountered a read error during the restoration of a disk. Mainframe systems in the early 1980s required copying the contents of washing machine sized disk drives to piles of 9 track open reel tapes. As drives were added, the piles of tapes became larger. Large storage areas were required for storing racks of tapes. The 1990s brought larger disk drives and the capabilities of the mainframes and PCs were converging. Single filesystems could be copied to DC100 serpentine tape cartridges, if one had a lot patience. Helical scan SCSI tape drives such as Exabyte 8mm and DDS 4mm were able to store the contents of multiple filesystems on one tape. For a brief while, tape capacity surpassed filesystem size. Robotic tape library machines could be programmed to automate the backup process and allow large numbers of filesystems to be copied to stacks of tapes. Disk capacities continued to expand rapidly. AIT tapes were good for larger backups, but the media was pricey. RAID arrays became a good way to increase storage capacity and improve reliability, but downtime could be long in the event of a controller failure. RAID backups are still critically important. Disk drive prices continued to fall. At some point after the year 2000, the price/performance of disks versus tapes made it more economical to buy another disk drive to copy data to. For the moment, it appears that the disk/tape competition is over and disks won. With a removable drive sled or a USB drive, a hard drive can now be treated as a high speed random access data cartridge. With multiple online machines, it is possible to use one machine as a backup repository for another's data. Today, it is possible to buy a 300GB disk drive for just over $100. Larger drives can be had for a slightly higher cost per byte. On a Linux platform, two of the oldest and most common backup utilities are dump and tar. Both work with tape and disk-based archives. Dump has the advantage of being able to dig through the contents of an archive and pick individual files to restore before reading the entire media. Unfortunately, the dump archive format has gone through a lot of changes. This means, for example, that a dump archive that was created on a Red Hat 7.3 system may be unreadable on a Ubuntu 7.04 system. Reading old tar files is more likely to result in success across systems of different vintages. Your author decided to standardize on tar-based backups. Now for some current real-world examples for performing disk-to-disk backups: Here's how to use dump to copy the local / filesystem to a compressed and datestamped file on the same machine's /backup filesystem: cd /backup /sbin/dump 0ufa - / | bzip2 > ./localslash`date +%Y%m%d`.bz2Here's how to use tar to do the same type of local to local backup: cd /backup /bin/tar cf - / | bzip2' > ./localslash`date +%Y%m%d`.tar.bz2Here's how dump is used to backup the / filesystem on a machine called remote to the local machine's /backup partition: cd /backup ssh remote '/sbin/dump 0ufa - / | bzip2' > ./remoteslash`date +%Y%m%d`.bz2Here's how to use tar to do the remote to local backup: cd /backup ssh remote '/bin/tar cf - / | bzip2' > ./remoteslash`date +%Y%m%d`.tar.bz2The above commands should be run from the root account and the remote backups can work without passwords if ssh is setup correctly. Ssh and the ssh server should be installed and configured on the machines. The dump and/or tar manual pages should be consulted for more information on the various command options. Restoration of the filesystems involves using bunzip2 to uncompress the archive, then restore (for dump) or tar (for tar) to split out the contents to a local disk. Restoration across the network is possible with the use of ssh. A good backup scheme should be devised. Your author has a dedicated machine with a large disk drive and an old DDS3 tape drive that is used to backup all of the rest of his machines to. Variations on the above examples are used in several machine-specific scripts to backup one machine at a time. The backups are performed several times a month. Backups can be copied from the backup machine's disk to tape for offsite storage. The entire backup set is occasionally copied to another machine's large disk for redundancy. Datasets can simply be copied with cp to removable media. A 100GB+ audio archive is managed differently than standard filesystems, the rsync command is used to clone the data from one machine to another. In the early 1990s, your author couldn't imagine ever getting close to filling up a 9GB disk drive. Then came audio archives, digital cameras with movie modes and other large data sources. Several hard drive failures and machine meltdowns have occurred, but no data has been lost. With a little planning, your data can be kept safe. (Log in to post comments)
How The Backup Process Has Changed Posted Nov 29, 2007 2:06 UTC (Thu) by jimparis (subscriber, #38647) [Link] A big problem with the dump/tar method of making backups is that you can't rely on them if the disk is in-use. Fixes can be tricky depending on what's going on. If you're backing up a database, you can dump it first and only back up the dump, not the database files themselves. For mail spools, you might want to acquire the necessary locks when reading the file so that you don't backup a partially-delivered mail. Etc. In some cases LVM snapshots might help, but you still might take a snapshot at a very unlucky time. Of course, none of that's a problem if you can just take the disk offline or make it read-only before backing it up.
How The Backup Process Has Changed Posted Nov 29, 2007 3:08 UTC (Thu) by brouhaha (subscriber, #1698) [Link] In some cases LVM snapshots might help, but you still might take a snapshot at a very unlucky time.Microsoft has addressed this issue in VSS ("Volume Shadow copy Service") for XP and Vista by providing an API whereby applications are notified when a snapshot is about to be taken, and when it has completed, so that those applications can force their on-disk data structures to be consistent (and possibly fully up-to-date). This is used by applications such as MS SQL Server, and by backup programs such as Ghost. Perhaps it's time for something similar to be done for Linux (and for that matter, BSD, Solaris, etc.) In principle, this could be done entirely in user space, with the backup program using an existing IPCF mechanism such as DBus to notify the applications that it is about to take an LVM snapshot. Applications that are interested (e.g., MySQL, Postgres, etc.) would listen for those, and send back a response that they are getting ready, and another when they are done. The backup program should send another message to the applications once the snapshot has been taken. When the backup program sends the initial "about to snapshot" message, it would set a timer. If no "application is interested" messages arrived within the interval, it would proceed. If any "application is interested" messages are received, it would set a second timer to wait for "application is ready" messages. Note that this kind of mechanism should ONLY be used around operations that can be done quickly like taking LVM snapshots. It's of no benefit if you try to use it to wrap use of dump or tar to write the active filesystem.
How The Backup Process Has Changed Posted Nov 29, 2007 4:44 UTC (Thu) by njs (subscriber, #40338) [Link] >Applications that are interested (e.g., MySQL, Postgres, etc.) would listen for those, and send back a response that they are getting ready, and another when they are done. So, I was thinking about this, and I can't actually figure out why an app like MySQL or Postgres would care about whether a snapshot was pending. Unless you have MySQL's famous I-don't-care-about-my-data mode turned on, in these systems all committed transactions are already on disk (and will be included in the snapshot), and all uncommitted transactions are not (and will not be included in the snapshot). And the RDBMS has no say in which transactions are committed and which are not. So... what would they actually do if they did receive this message? (And more disturbingly, what does MSSQL think it has to do?)
How The Backup Process Has Changed Posted Nov 29, 2007 10:20 UTC (Thu) by james (subscriber, #1325) [Link] Put it another way -- a database that cannot be restored reliably from a point-in-time snapshot cannot be restored reliably if the system crashes.This may be acceptable if there is a suitably reliable way of recreating the state of a database from external data. Usually there isn't. See also crash-only software: if it's worth putting data into a database, it's worth being sure that the database is really crash-proof.
How The Backup Process Has Changed Posted Dec 1, 2007 8:31 UTC (Sat) by alankila (subscriber, #47141) [Link] I second this. As long as the database performs its datastructure updates through journalled techniques, it should not matter when you take the snapshot. Maybe MSSQL isn't using proper journalling at interests of higher performance? Journals tend to incur the cost of having to write the same data twice, once to journal and once to final destination. Of course, I should investigate instead of just spouting off, but I'm hoping someone can tell me what's wrong with this picture, if anything.
How The Backup Process Has Changed Posted Dec 1, 2007 8:50 UTC (Sat) by brouhaha (subscriber, #1698) [Link] Sorry, I chose my examples poorly. You're correct that MySQL does not need anything special to force on-disk consistency.A better example would have been a more typical application such as a word processor or CAD program. If the disk snapshot were to be taken while the application were in the middle of writing a document to disk, the snapshot might not have a coherent version of that document. Of course, this depends a lot on exactly what kind of document writing strategy the application uses. If the application writes to a new file, then moves it to the original name, the snapshot will be guaranteed to have a coherent version of the original file, the new file, or possibly both. Applications that rewrite an existing file are problematic.
How The Backup Process Has Changed Posted Dec 1, 2007 20:41 UTC (Sat) by njs (subscriber, #40338) [Link] That might explain why Windows needs this -- on unix, the atomic saving process you describe is pretty common (is it ubiquitous? I know emacs uses it, but no idea about, say, openoffice). On Windows, though, IIRC, the filesystem semantics are such that atomically saving a file is impossible.
How The Backup Process Has Changed Posted Nov 29, 2007 12:20 UTC (Thu) by ayeomans (subscriber, #1848) [Link] CoW (Copy on Write) filesystems maybe? Would seem to solve most issues on files being modified and also being able to get back old deleted versions. And if they could also gracefully handle file and directory renaming and moving that would be truly wonderful.
How The Backup Process Has Changed Posted Nov 30, 2007 4:51 UTC (Fri) by njs (subscriber, #40338) [Link] LVM is a sledgehammer approach to getting a CoW filesystem. They're essentially identical in principle, just different trade-offs in manageability, etc.
How The Backup Process Has Changed Posted Nov 29, 2007 4:31 UTC (Thu) by njs (subscriber, #40338) [Link] LVM-snapshots really do essentially solve this problem in practice, as far as I can see. It's true that they're not as nice as gracefully quiescing the system, but the difference is not that large. LVM essentially simulates pulling the plug on the machine, and then copying the (now quiescent!) drive. This can create inconsistencies of various sorts, but -- this is the important trick -- all the programs that worry about data consistency are already designed to handle power failures, so they already have code to handle stale locks, journal rollbacks, etc. That's *much* better than just backing up the live system, which can give you wildly inconsistent data (a full backup can take hours from start to finish), is basically guaranteed to trash any in-use databases, etc...
How The Backup Process Has Changed Posted Nov 29, 2007 14:51 UTC (Thu) by nix (subscriber, #2304) [Link] Of course snapshotting the root filesystem is not supported, which makes this less useful than it might be, at least for whole-system backups.
How The Backup Process Has Changed Posted Nov 29, 2007 16:36 UTC (Thu) by jimparis (subscriber, #38647) [Link] Works fine for me (I don't see why LVM would care where the filesystem is mounted):
$ mount | head -1
/dev/raid1/root on / type ext3 (rw,errors=remount,ro)
$ sudo lvcreate --size 100m --snapshot --name snap /dev/raid1/root
Logical volume "snap" created
$ sudo lvdisplay /dev/raid1/snap | grep -i status
LV snapshot status active destination for /dev/raid1/root
LV Status available
How The Backup Process Has Changed Posted Nov 29, 2007 17:17 UTC (Thu) by nix (subscriber, #2304) [Link] It works but there's no guarantee that it won't deadlock. The problem is that LVM might need to read configuration state or executable pages (or write config backups) on the root filesystem, but the process of creating a snapshot includes a (brief) period when the origin volume is suspended, so reads and writes to it will block. The deadlock potential is, I hope, obvious.
How The Backup Process Has Changed Posted Nov 29, 2007 19:31 UTC (Thu) by jimparis (subscriber, #38647) [Link] OK, so copy static LVM binaries and configuration to a temporary ramdisk and run them from there. There's nothing special about "the root filesystem", there's just something special about "the filesystem that lvm lives on". And don't snapshot an active swap, that could get ugly :)
How The Backup Process Has Changed Posted Nov 30, 2007 1:33 UTC (Fri) by nix (subscriber, #2304) [Link] You might need to modify it to write its config backups and things somewhere else (preferably make it configurable at runtime). But yes, if you can avoid those problems then that might work, and thanks to tmpfs pretty much everyone has the moral equivalent of a ramdisk within easy reach. (Snapshotting an active swap partition is just barmy. Snapshotting a filesystem containing an active swapfile is careless and risky, but thankfully swapfiles tend to get used only for short-term oh-shit-we-need-another-X-Gb-of-swap-right-now stuff, at least in my experience. They're not something you habitually run with for ages.)
How The Backup Process Has Changed Posted Nov 30, 2007 4:56 UTC (Fri) by njs (subscriber, #40338) [Link] >thankfully swapfiles tend to get used only for short-term oh-shit-we-need-another-X-Gb-of-swap-right-now stuff, at least in my experience. They're not something you habitually run with for ages. Wandering *way* off topic, is there actually any reason we don't all use swap files these days, other than inertia? They certainly allow more flexible on-the-fly configuration of your swap needs, and I'm not aware of any disadvantages. Seems like a desktop distro optimizing for simplest-thing-that-works would be quite sane to just slap a single partition on the hard disk and then allocate a swapfile in it.
How The Backup Process Has Changed Posted Nov 30, 2007 8:22 UTC (Fri) by nix (subscriber, #2304) [Link] I'd go with inertia too. Splitting up your fs into more than one big lump still has advantages (putting your data somewhere else allows you to blow away the rest more easily: you can hive off filesystems as a whole onto remote storage slightly more easily: it keeps them safe from each other being corrupted to some degree; you can mount them readonly and so on) but IIRC the only advantage of swap partitions these days is that they're guaranteed to be contiguous.
BackupPC Posted Nov 29, 2007 2:32 UTC (Thu) by HappyCamp (subscriber, #29230) [Link] Personally I would use BackupPC. http://backuppc.sf.net/ But then again I am backing up multiple computers.
BackupPC Posted Nov 29, 2007 4:40 UTC (Thu) by njs (subscriber, #40338) [Link] Second the recommendation of backuppc. It's very easy to use, very easy to backup multiple computers (including non-unix ones, though we don't have any), uses rsync, compresses its storage, manages backup rotation (seriously, if you don't have periodic backups stretching back at least a month, it's not a real backup solution), etc. The key feature for us was that backuppc is smart enough to notice identical files on multiple computers and only store them once; given that we have large, substantially overlapping music collections on our various computers, this gives us effectively 2-3x more space on our backup server. As far as I could tell, it's the *only* free backup system that does this; dervish had a feature that looked sort of similar, but it turned out to be a dumb hack. Backuppc is also a quirky, overgrown mess of perl scripts; it's just been hacked on long enough that it probably already has a quirk thrown in for whatever situation you're in. It would be so nice to have a backup solution that was designed properly from the start, though, aiming directly at disk-to-disk backup, with a sane and smart storage backend, and a simple API for retrieving files and otherwise managing the store.
How The Backup Process Has Changed Posted Nov 29, 2007 4:25 UTC (Thu) by paulmfoster (subscriber, #17313) [Link] This recently came up on another list I'm on. One of the answers is rsync, which this author didn't mention. I don't know why. Rsync can do a file-by-file backup, omit certain files, and a variety of other things. But the best part is that it only copies changed bits/changed files. And it will "prune" files on the backup which are no longer resident on the machine being backed up. It doesn't require root privileges, and it can use ssh or rsh as a remote shell if needed. If you need to do a bit-by-bit copy of a partition, no, it won't work. But otherwise it seems the perfect solution for most backup needs. My cron'd rsync backups are far more reliable than my wife's backups on her Windows machine.
How The Backup Process Has Changed Posted Nov 29, 2007 5:54 UTC (Thu) by jamesh (subscriber, #1159) [Link] Using rsync the way you've described is fine for mirroring your system's current state. But it only gives you one state of the filesystem. How do you recover a file that you deleted/modified but only noticed after a few backup cycles? If that is a requirement, then plain rsync probably isn't enough.
How The Backup Process Has Changed Posted Nov 29, 2007 6:15 UTC (Thu) by paulmfoster (subscriber, #17313) [Link] Personally, I'm not very forgiving to users in cases like this. If they don't notice the problem before the next backup (tomorrow), it's gone. However, if you want to have that kind of functionality, you could simply designate different backup sites for each day, for example. This would be comparable to doing a separate backup tape for each day of the week, in the old days. FWIW, tar and dump won't provide the functionality you're talking about either, out of the box. And *none* of these options will do "incremental" backups (only those files changed get backed up, and backups consist of only changed files).
How The Backup Process Has Changed Posted Nov 29, 2007 7:14 UTC (Thu) by chema (subscriber, #32636) [Link] Both dump and tar can do incremental backup. You can control the incremental backup in dump with the backup level. And in tar case, you can use the '--after-date DATE', '--newer DATE' options which will make tar to only store files newer than DATE. Moreover, you can combine find with tar to get a "more selected" backup. BTW I'm quite surprised noone has mentioned Bacula or Amanda. -- Chema
How The Backup Process Has Changed Posted Nov 29, 2007 19:03 UTC (Thu) by vmole (subscriber, #111) [Link] Well then, let me promote Bacula (http://www.bacula.org/). It's overkill for backing up a single PC to a spare disk, but once you get more than a few clients, having eveything organized in one spot is much nicer than dealing with a bunch of ad-hoc rsync scripts. Bacula supports a wide variety of tapes, tape changers, and disk-based volumes, and DVDs. You can schedule full backups, differential, and incremental. You can run scripts on the clients before and after backups (to e.g. dump databases, stop and start processes, etc.) Supports Windoze, OSX, and most unix-like OSes as clients; while you can build (or download) the server for Windows, it's not the main development platform. I used it at work, backing up a wide variety of systems, and use it at home. Very satisfied.
dump incremental backups Posted Dec 1, 2007 9:10 UTC (Sat) by mennucc1 (subscriber, #14730) [Link] The way 'dump' does incremental backups is superior to what can be achieved by simply using tar '--after-date DATE', '--newer DATE' Dump indeed stores also the state of directories, so that incremental backups will keep record of which files were deleted.
How The Backup Process Has Changed Posted Nov 29, 2007 15:06 UTC (Thu) by paulmfoster (subscriber, #17313) [Link] Well, apologies to all those who thought I knew what I was talking about. Apparently, incremental backups are more widely supported than I thought. I stand corrected. BTW, the mikerubel.org link is outstanding.
How The Backup Process Has Changed Posted Dec 1, 2007 22:19 UTC (Sat) by giraffedata (subscriber, #1954) [Link] Personally, I'm not very forgiving to users in cases like this. If they don't notice the problem before the next backup (tomorrow), it's gone. Odd choice of words. I'd say, "I don't provide backup service to users for cases like this." And I think that's a huge omission. For my personal data, I have lost far more due to corruption of the primary copy than by the primary copy becoming unreadable. Accidental deletion, naive modification of source code, program run amok, dishonest employee, etc. And of those, the majority was not detectable within 24 hours. So my backup systems have always concentrated on being able to get old copies of files back, at the expense of being able to recover from a broken disk drive easily.
How The Backup Process Has Changed Posted Dec 2, 2007 5:31 UTC (Sun) by paulmfoster (subscriber, #17313) [Link] In my case, my "users" are myself and my wife (our company is just us). And I'm *not* very forgiving of us. All my data loss has been my own stupid goofs. I know right away, and go back to the backup. I'm more concerned with having multiple copies of the same backup, in case of disk failure or lightning strike. But if I had a larger company or an employee who had the expertise to hack things, I'd take a view more like yours.
How The Backup Process Has Changed Posted Dec 2, 2007 23:04 UTC (Sun) by giraffedata (subscriber, #1954) [Link] About half the time when I destroy data with a stupid goof, the destruction has been backed up by the time I discover my error. Something works fine for years, then I get the bright idea to improve it. A week later, I find out the hard way that it wasn't an improvement. I back up every day.
How The Backup Process Has Changed Posted Nov 29, 2007 6:43 UTC (Thu) by gera (subscriber, #43819) [Link] You should see rdiff-backup. It uses librsync, and does act like a "real" backup system - preserving copies of modified/deleted files.
How The Backup Process Has Changed Posted Nov 29, 2007 16:44 UTC (Thu) by daniel-crawford (subscriber, #4099) [Link] In my experience, rdiff-backup is too dependent on Python (version mismatches, odd crashes), and not robust enough. Plain rsync behaves in a much more stable way across OSes and versions, and calling it with a script that uses --backup and --backup-dir (like tridge's 52-week shell script) will give you all the backup window you want.
How The Backup Process Has Changed Posted Nov 29, 2007 6:51 UTC (Thu) by elama (subscriber, #262) [Link] I'm using: --backup-dir=/backup/deleted/$DIR.$TODAY Gives me a full backup of all current files and all changed/deleted files in a separate directory. To me this looks much more convenient than incremental backups.
How The Backup Process Has Changed Posted Nov 29, 2007 14:25 UTC (Thu) by tcabot (subscriber, #6656) [Link] http://www.mikerubel.org/computers/rsync_snapshots/ shows how to use rsync to do efficient incremental backups.
How The Backup Process Has Changed Posted Nov 29, 2007 18:19 UTC (Thu) by jschrod (subscriber, #1646) [Link] The answer to your question is dirvish, at http://www.dirvish.org/.
How The Backup Process Has Changed Posted Nov 29, 2007 15:37 UTC (Thu) by stevan (subscriber, #4342) [Link] While we're not that large, we use a combination of rsync, tar and gpg for our backup. We rsync all systems, including remote ones, nightly, to a structure held on a RAIDes server large enough to handle the volume, each system being backed up using a standardised script with the system name as a parameter. Each filesystem, or rather copy of the filesystem, is then tar'ed into an archive area, with the required number of days archives automatically rotating. Midweek, we gpg the tarballs of the critical systems onto removable disks and ship them to offsite storage, with 6 week's data held offsite. At the weekends, we rsync the rsync'ed volumes (if you see what I mean) to a hosting company, which is also where we can do DR. DR recovery is quick, because there's no untar'ing to be done, and if necessary whole filesystems can be mounted. It's not infinitely extensible, and probably practically tops out at about 2TB of data, but it works for us. Thanks for the type of article that makes sysadmins purr with pleasure. S
Remote backups Posted Nov 29, 2007 6:54 UTC (Thu) by leoc (subscriber, #39773) [Link] Lately I have been looking into offsite backup solutions and there seems to be quite a lot of choice. One popular, but fairly expensive solution, is rsync.net. They charge a couple of bucks per gig per month, but provide all the protocols you could ever want (including rsync, of course), and they even offer a second level of geo-redundancy for an extra charge. A much cheaper solution would be the Amazon S3 system, however it only provides a simple SOAP API set, so you have to use specially written client software to get rsync-like functionality. From what I have read, it sounds like a tool called duplicity is the current favourite as it provides rsync functionality plus gnupg based encryption. I'm currently using portable USB attached hard drives for backup, but I will likely be switching to online backups once I figure out the best approach from a cost and security perspective. Also, apparently google will be offering something in this market soon. Another idea someone mentioned to me was to make an agreement with a friend or colleague to mutually grant access to each others systems to do remote backups. If you know someone you trust enough to do this, it might be the least expensive option.Another issue with respect to long term backups is file formats. Any files saved in anything but plain text or perhaps jpg will likely be unreadable after a fairly short amount of time.
Remote backups Posted Nov 29, 2007 7:21 UTC (Thu) by grahame (subscriber, #5823) [Link] Disks didn't quite win, you're missing the fact that tapes are far superior as something to put into boxes and ship as bulk data storage. Much more resilient.
Remote backups Posted Dec 1, 2007 15:08 UTC (Sat) by gerv (subscriber, #3376) [Link] I'd second the recommendation for rsync.net. They even do discounts for free software people. Gerv
How The Backup Process Has Changed Posted Nov 29, 2007 8:59 UTC (Thu) by PaulWay (subscriber, #45600) [Link] I'd add a mention of DAR here (http://dar.linux.free.fr). Designed as a Disk-ARchiver rather than tape, it incorporates many of the features that one needs when doing backups - compression, encryption, slicing, backup of SELinux and other extended attributes, differential and incremental backups, and a suite of file selection methods. It keeps a catalogue of the files and their locations, so the disk seeks to the exact file you want quickly, and the catalogue can be separate for incremental or differential backups relative to fixed media (e.g. DVDs). It may be counter to the old Unix philosophy of combining small tools into a larger solution, but it avoids the deficiencies in interaction between those parts. It's dependencies are relatively small, it compiles on all architectures AFAIK and it produces a dar_static file that can be easily put onto your backup media just in case...
How The Backup Process Has Changed Posted Nov 29, 2007 14:54 UTC (Thu) by nix (subscriber, #2304) [Link] One downside is that unless you configure it with --enable-mode=64, it has *enormous* memory requirements (as in `backing up a million files uses several Gb of RAM'). With --enable-mode=64 it only needs several hundred Mb, which is still rather nasty. But, well, it works. (I use it too, in conjunction with par2 and a huge pile of ugly scripting. The huge pile of ugly scripting appears to be de rigeur for Unix backup systems.)
How The Backup Process Has Changed Posted Nov 29, 2007 10:48 UTC (Thu) by sitaram (subscriber, #5959) [Link] I wrote a set of articles for a friend of mine about a year ago, and people have told me I should submit them for publication somewhere. I wonder if anyone would care to take a look at http://sitaramc.googlepages.com/mirror-sync-backup-tools-... and comment on them? I'd feel a lot better about submitting them if someone who didn't know me said they were worth it ;-) Please keep in mind that the series was written over a year ago! Thanks, Sitaram
How The Backup Process Has Changed Posted Dec 1, 2007 1:56 UTC (Sat) by kevinbsmith (subscriber, #4778) [Link] Speaking as someone who has heard of, but not attempted to use, most of those tools, this looks like an excellent series.
How The Backup Process Has Changed Posted Nov 29, 2007 12:49 UTC (Thu) by hjb (subscriber, #25523) [Link] Hi Forrest, why do you use bzip2? It's much too slow to be practical. From my limited point of view, I know nobody who'd use bzip2. gzip is the better choice. I someone wants a very efficent, secure backup to a hard disk (possibly of several computers), one should look at BoxBackup (http://www.boxbackup.org/). It stores a complete revision history of every file so in theory it should be possible to restore an arbitrary point in time, although the client currently is missing this feature. I just published an updated version of my article about it (in German) on http://www.pro-linux.de/berichte/boxbackup.html Regards, hjb
How The Backup Process Has Changed Posted Nov 29, 2007 17:43 UTC (Thu) by Los__D (subscriber, #15263) [Link] If you need speed, go with gzip (or maybe even just plain tar), if you need space, go with bzip2...
How The Backup Process Has Changed Posted Nov 30, 2007 3:59 UTC (Fri) by sitaram (subscriber, #5959) [Link]
I have this bad habit of doing all my research in one shot, as exhaustively as I can,
summarising it for quick reference, and then discarding the raw data :-)
With that caveat, here is a dump of the entry titled "Choosing between LZMA, BZIP2, and GZIP"
in my personal quickref wiki:
---------------------------
LZMA is the new kid on the block: less space and faster decompression (than
BZIP2) at the cost of much, *much* slower compressions.
(Default compression levels are GZIP: 6, BZIP2: 9, and LZMA: 7)
Summary
-------
Use none when
- almost all files in the dataset are already compressed (DUH!)
Use GZIP when
- time is more important than space, or
- system memory is very limited, or
- a lot of files in the dataset are already compressed but nowhere near
all of them
Use LZMA when
- space is more important than time, or
- space is important AND the file will be decompressed many times
Benchmark BZIP2, LZMA at level 1 and perhaps LZMA at level 2 when
- both space AND (compression) time are important, and
- you're going to be compressing this same dataset frequently (like a
daily backup script for your email folders)
Otherwise just use BZIP2
How The Backup Process Has Changed Posted Dec 8, 2007 1:52 UTC (Sat) by roelofs (subscriber, #2599) [Link] Use none when- almost all files in the dataset are already compressed (DUH!) Major omission, both here and in the main article: also use none when your backup medium (and/or the path to it, including RAM) may have errors. Both compression and encryption largely destroy any ability to recover data past the error location. (I discovered two bad bits in 1 GB of memory while verifying a backup to DVD+R.) Otherwise just use BZIP2 bzip2 is much, much slower than gzip on decompression, too. If it's read-once (or read-none), then that may not matter. But for read-many it's pretty bad. (I have no data on LZMA or other alternatives. Capacity is cheaper than CPU, however.) Greg
How The Backup Process Has Changed Posted Nov 29, 2007 13:04 UTC (Thu) by canatella (subscriber, #6745) [Link] One annoying thing with backup is that it takes time. I use a rsync script with the options --backup and --backup-dir to do incremental backups. I've added it to my /etc/rc0.d directory so that it is executed at shutdown, I can let the script run, backup my local data and remote server data and it shutdowns nicely when all is done.
Dirvish Posted Nov 29, 2007 13:57 UTC (Thu) by fmyhr (subscriber, #14803) [Link] I'll put in a plug for Dirvish: http://www.dirvish.org/ Uses rsync, hard links to efficiently keep multiple versions of a backup set on a single volume. I use it to back up to hard drives that I can remove and store off-site. Here's a convenient "mobile rack" for sata drives: http://www.addonics.com/products/mobile_rack/aesnapmrsa.asp
Using rsync Posted Nov 29, 2007 14:28 UTC (Thu) by heini (subscriber, #33614) [Link] > Using rsync the way you've described is fine for mirroring your > system's current state. But it only gives you one state of the > filesystem. How do you recover a file that you deleted/modified but > only noticed after a few backup cycles? > > If that is a requirement, then plain rsync probably isn't enough. Yes, it is. However, there are several backup solutions which use rsync under the hood, one of them being rsnapshot (http://www.rsnapshot.org/). The big advantage of using rsync is that restore is very easy (cp, scp or even rsync). Oh, and in case it matters, rsync can backup ACLs, which - to my knowledge - tar still can't handle. Bye... Dirk
Using rsync Posted Dec 2, 2007 17:40 UTC (Sun) by bchapman26 (subscriber, #4565) [Link] I second rsnapshot. This simple script revolutionized backing up my systems. bye bye tapes, bye bye rotating tapes, no more lengty restores, and with cron it is automated with meaningfull summaries emailed daily. Something propriatory solutions can't seem to get right.
How The Backup Process Has Changed Posted Nov 29, 2007 15:12 UTC (Thu) by jongeek (subscriber, #39233) [Link] Anyone else using subversion/some other version control system ? After reading a couple of articles on the subject, I put my home directory into subversion at the beginning of last year. I do a subversion dump every night, and back up the dumps to DVD periodically (you can do it as often as you like). I actually use a rotating set of DVD-RWs, writing them to DVD-R a few times a year. Subversion manages the modified/deleted files problem, and works well with binary files. It doesn't work too well for music/videos since it keeps a second copy of your data in the .svn directory, but rsync/tar/DVD backups take care of those.
How The Backup Process Has Changed Posted Nov 29, 2007 16:39 UTC (Thu) by nix (subscriber, #2304) [Link] There's nothing that handles people who have symlinks and hardlinks in their $HOME well, as far as I know. (Let alone device files, but that really *would* be strange.)
How The Backup Process Has Changed Posted Nov 29, 2007 16:49 UTC (Thu) by jongeek (subscriber, #39233) [Link] Good point about the hard links. Subversion can manage symbolic links perfectly well, though.
A plug for Box Backup... Posted Nov 30, 2007 3:21 UTC (Fri) by knobunc (subscriber, #4678) [Link] A friend and I are both relatively well-connected by the same ISP so we wanted to cross-backup to each other's houses. The requirements were: - Automated - Preservation of older versions - Encrypted - Gentle to the network - Able to back up large amounts of data The nasty one to find was the encryption solution, but Box Backup (http://www.boxbackup.org/) has been performing decently well. Dirvish looked interesting, but the backup runs appeared to need a large amount of disk or memory to run. Please correct me if I am wrong. -ben
Dirvish memory use Posted Dec 2, 2007 0:10 UTC (Sun) by fmyhr (subscriber, #14803) [Link] I haven't seen Dirvish use large temporary files during backup, if that's what you mean. And I haven't noticed problems with memory usage, though I'll admit I'm not 100% sure how to measure that. You mention encryption. That doesn't need to be built into the backup software. In the past I've used stunnel with dirvish, which worked ok. Now I'd probably use openvpn.
Dirvish memory use Posted Dec 8, 2007 2:10 UTC (Sat) by roelofs (subscriber, #2599) [Link] You mention encryption. That doesn't need to be built into the backup software.It does if there's a chance you and your buddy will cease to be such at some point in the future. Or let's make it even easier: do you trust everybody your buddy trusts in his/her house? No? OK, then. Greg
How The Backup Process Has Changed Posted Dec 1, 2007 9:07 UTC (Sat) by mennucc1 (subscriber, #14730) [Link] hi I use dump for backups of my PC and co-workers. (I have developed a long and complex shell script that dumps , incrementally, multiple partitions to a remote host.) I dont like the proposed line /sbin/dump 0ufa - / | bzip2 > ./localslash`date +%Y%m%d`.bz2 In my backup script I instead use a line as /sbin/dump -0 -j7 -q -u -f - / > ./localslash`date +%Y%m%d`.bz2 A good point of using 'dump -j7' is that it compresses files separately (as 'zip' but unlike 'tar cjf') so that, when needed, 'restore -i' can recover single files fast and conveniently.
How The Backup Process Has Changed Posted Dec 3, 2007 6:14 UTC (Mon) by pascal.martin (subscriber, #2995) [Link] All these tools have been around for 20 years or more, the concept is more than 40 years old. Time for something actually modern? I am afraid the world has changed a bit more than reported. Apparently the latest mac os to be released soon has continous backup: install a disk as backup and the os keeps logging file changes to it. As reported by a friend who tried a beta, the design support connecting the backup disk only intermittently. Makes alll these daily, weekly and monthly rotations sound like stupid, don't you think? Thanks to Apple for reminding us that there is no need for backups to be a sweaty process. Computers can do it better.
How The Backup Process Has Changed Posted Dec 6, 2007 18:06 UTC (Thu) by thedevil (guest, #32913) [Link] >> continous backup << But this doesn't deal with the natural disaster/terrorist attack scenario.
"dump 0ufa /" and "tar cf - /" do not back up the same data Posted Dec 12, 2007 10:04 UTC (Wed) by kaig1969 (guest, #37135) [Link] The dump and tar examples are presented as if they did the same thing. But my understanding is that the dump example will dump the / filesystem whereas the tar example will dump all filesystems mounted on or below /. GNU Tar has the "--one-file-system" option that promises to deliver the dump behavior.
|
Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.