Charles Lanteigne Photo
 

Tidy and Effortless Redundancy

Publié le


A few of the hard disks I use.

The importance of backing up your data is one of those things you often learn the hard way, once it's too late. But enough about that: I don't want to bore you with another treatise on the why, which you should already be familiar with, but rather discuss the how—specifically the hardware side of the equation.

There are tons of approaches to backing up (which is one part of the larger issue of asset management), and most of the ones I have seen are either uselessly convoluted/messy/costly, or replace the initial problem by new ones.

I want to show you the approach to backing up I have adopted, but obviously there are more trivial, as well as more demanding situations for which my approach won't be appropriate:

  • If you are a hobbyist and unlikely to use more than one or two disks for your entire image collection, you probably don't need a solution that can scale the way mine does. (But my approach is so simple and cost-effective that you could use it anyway, at a smaller scale.)
  • At the other end of the spectrum, if you are managing a larger studio where multiple people need to work simultaneously on shared files, you will likely need to use network storage and face the problem of concurrent versions, which I will not address here.

What you need

It's amazing how far-fetched solutions to the simple problem of redundancy can get. All you really need is to have your data at multiple locations, so that when one of your disks fails, you have other copies available.

"First, my files are backed up on this Time Machine disk connected to a Firewire port. Then, every next day I do an incremental backup of the raw files on this red USB 3.0 drive here, and I do a separate backup on this larger gray disk on which I also copy the final edited files. Then—oops, be careful with that cable—every week I make a complete copy of everything on this network-attached RAID enclosure, over there."

It's a nonsensical mess—both logistically and materially—and it's not easily scalable. How did it ever get so complicated? This is closer to what it should be like:

Questions?

Anyway, let's first establish some of the points to keep in mind when discussing a backup strategy:

  • The backup disks do not need to provide particularly high performance. Even if it takes a bit more time to copy files over, it's not a big deal, because you are not constantly accessing those files. This is in sharp contrast with your work disks, which need to provide high performance.
  • For the same reason, your backup disks do not even need to be constantly spinning, which would only risk shortening their life.
  • You (probably) only need mirrors of your files, not every single past version of them. Apple's Time Machine, for example, is brilliant for your office documents, but does not necessarily make sense for your large Photoshop files. Keeping different versions also requires much more disk space, which is an additional problem you don't need to be encumbered with.
  • To be safe, you need three copies of your files on different disks(1). If one of your disks fails and you only had two copies, it means that for a while, until you have copied the files elsewhere, they reside at only one location, which is very risky. (But, of course, two copies are obviously still better than just one.)

What you think you want

If all of your files will, for the foreseeable future, fit comfortably on a single hard disk, you can just work on a local drive and make mirrors on two external drives—end of story. But for those of us who produce a lot of images, that's not going to cut it.

What happens when you must regularly work on multiple different disks? Obviously you can't have unlimited room for additional internal disks (if you have room for more than one in the first place, that is), so your work disks will have to be external.

External Drives

This presents the problem of the speed of the bus: you can't work on an external drive over a USB 2.0 link, for example, because the bandwidth is simply too slow. Your natural inclination will be to get an external disk that uses a fast bus like IEEE 1394 (Firewire), USB 3.0, or Thunderbolt—the new kid on the block. This successfully dodges the speed issue, but you have to pay a premium for each additional drive you get over the price of a bare hard disk, because you have to buy the enclosure with it. This is particularly the case with the more exotic Firewire and the hot-off-the-press Thunderbolt kind.

This also adds considerable clutter to your desk. One or two drives are not so bad, but as you add more drives, things get messy. The problem is that each additional drive requires a data connection cable and a power adapter cable (which always manages to be different). Moreover, the available external drives change in shape constantly to follow the flavor of the month—some are vertical, some are horizontal, some are stackable (most are not), some are boxy or rounded, some have built-in cooling fans, etc. Even if you can buy a few of the same kind today, it is inevitable that next year you won't find the same design.

Nope, this won't do.
(Top image by brandon shigeta. Bottom image by Stefano Bertolotti.)

And that's the least of your problems, because even if they could all have the same unobnoxious shape and be neatly stackable, who's to say that the bus they use will be supported by your new computer? Mac users can tell you all about that time when Apple basically abandoned Firewire (like it suddenly abandoned other technologies in the past), requiring adapters to use their devices.

This is when you might come to the conclusion that it would be preferable to get a disk enclosure that can contain multiple disks. Not only does this create much less clutter, but you can hit two birds with one stone by making the enclosure take care of redundancy: no more backup management, you simply configure the enclosure to use the desired RAID(2) mode and you're set.

Drive Arrays

If you have been paying attention to what other photographers have gone through, you should immediately have reservations about this solution, and you'd be right—many have gone through hell after entrusting their data to these storage units. On paper, a RAID enclosure sounds very much like what magic must be like, but in practice, it's not that simple.

A small drobo unit.
(Image by Scott Beale/Laughing Squid)

Even if you can shoulder the hefty cost of those units, you must understand that now your data not only depends on the integrity of the disks, but also on the integrity of the enclosure itself. If it fails, you better hope you can still find an equivalent unit able to interpret the array—or that your unit was configured in strict mirroring mode (RAID-1), that it used a standard file system, and that you have technically industrious friends able to help you—otherwise you are in serious trouble.

Long story short, my feeling is that a RAID array has a place, but that place is either in a large-scale enterprise-level server, or in a situation where very high performance is required on large amounts of data—such as when editing 4K or raw video footage. For photographers, a RAID array is an overkill solution that comes with its own share of additional concerns.

Still, an enclosure that can hold multiple drives is not necessarily a bad idea if you use it differently, as you will see below—as long as you get a trayless one, otherwise you will be forced to perpetually purchase these useless annoying contraptions (and hope the manufacturer doesn't stop making them).

What You Should Consider

There are positive aspects to the solutions mentioned so far, but on top of their cost, they are not without problems. Luckily, there is a disconcertingly simple and cost-effective solution to all of this.

Trayless hard disk slots(3).
(The first disk is in use, the second one is just sitting there.)

There are a crap ton of benefits to this minimalistic solution:

  • A trayless slot costs next to nothing—all this is, is a plastic rack—and you'll only really ever need two or three (two to be able to copy between two disks, more if you want to continue working on other disks while backuping). (Note that there are other implementations in which multiple slots are closer together in a single rack that occupies more than one 5¼" bay.)
  • All you need to buy to increase your storage is bare hard disks—which cost less than external drives. Moreover, because of this, you actually know what models of hard disks you are paying for (what a concept!), whereas external enclosures often don't even reveal what's inside... (Hint: LaCie and G-Tech do not make hard disks themselves.)
  • You can choose high performance work disks and inexpensive backup disks. For example, I choose the Western Digital Black series for my work disks and Blue/Green series for backup purposes, which optimizes my investment.
  • The slots are embedded in the computer case, so they require zero cable.
  • They are connected directly to the SATA ports of the motherboard, so all the disks perform exactly like internal disks. It doesn't get more direct than that.
  • You can insert/remove disks at any time, in the same fashion as a USB key—it is a feature of the SATA bus that drives can be hotplugged.
  • When disks are not in use, they are not spinning uselessly, which extends their life.

When I need more storage, all I need to do is purchase three new bare hard disks (one master and two copies) and I'm good to go. Seriously, guys.

Because my images are organized in a chronological order, when a master volume is (almost) full, I can just continue where I left off on the next disk—it is very easy to locate where the files are in the archive when I need to go back to a past project, because each set of disks is associated to a unique and unchanging time period. The Lightroom catalog is kept on an internal disk and contains references to all the images—including the offline archive—so it doesn't get easier to locate things.

I use a little program to mirror my master volume to the backup disks: the program finds out what has changed since the last time and makes up for the difference, it takes no time at all(4). I can simply slide the backup disks in the slots and run the script whenever something needs to be synchronized, not according to an arbitrary schedule.

But...

Yes, I know, your Mac doesn't have 5¼" slots available for this to be a possible solution. Or you would like to be able to carry your files with you, or use your drives with more than one computer. Those are valid points—not my situation, but I see how this can be the case. No worries.

You can still get most of the benefits of the proposed solution by using an external hard disk mounting device like a trayless multi-disk rack (that mounts drives separately—not as a RAID array) or, even less sophisticated, an inexpensive hard disk dock. Simply make sure your device uses a fast bus (eSATA, USB 3.0, or Thunderbolt, depending on the compatibility you need). One set of cables is all you will have to tolerate.

Example of a hard disk dock. Slide disks in, and boom.

Simple, neat, high performance, scalable and cheap.

Now you have no good reason not to close the loop on proper asset management.


(1) If you want to take things to the next level, you might also want to keep said disks at more than one physical location, in case something really bad happens like theft, fire, zombie uprisings, etc.

Yes, storing all of our files in the cloud (where ensuring data integrity becomes largely somebody else's business) is certainly where technology is going to take us at some point. In fact, you can already do it to a certain extent, for a fee. But until we can transfer gigabytes upon gigabytes of data at very high speeds, this will only be used as a supplemental degree of protection in an existing backup strategy. Go ahead and do it if you can, but be careful trusting an online service with your precious data: others have been burned before with lost data and companies shutting down without warning!

(2) A RAID is a system that allows the creation of storage units that are super fast (by distributing I/O operations on multiple concurrent disks—which actually increases the risk of data loss) or redundant (by writing the same data on multiple concurrent disks)—or both, if you throw enough disks at it. From your point of view, your data is at one single [virtual] location—the system takes care of the necessary wizardry.

(3) My computer case is a reused 12-year-old beige/blue atrocity. Its powerful innards is what I care about—sue me.

(4) That, despite the fact that the program re-reads everything it writes before calling it a day, which is required to ensure that nothing went silently wrong during the copy.

Classé sous
39 / 43 Archive
RSS Blogue
39 / 43 Archive