Here’s Why You Never Ignore Root’s Mail

A consulting client had asked for my help in a Solaris-to-Linux migration, and we had the big moves done. We were cleaning up, doing “while you’re here” tasks, just checking to verify that some things were working as expected.

The backup process had moved to Linux about a year ago, and migrating web service of a large archive to another Linux server made them think of this. Automounting, as we teach you to set up in Learning Tree’s Linux server administration course, made it trivial for the backup server to make an archive of the archive stored on an NFS server.

Since backups are so crucial, let’s check to make sure that it’s working as intended. Someone had written a script to automate the backups, run through cron at 0200 every night during the week. They simply made a snapshot and a full backup of that every night. They were using some high-capacity compact tapes, loading a fresh cartridge at the end of each work day. Every weekday morning the tape had been rewound and ejected, ready for collection.

I was impressed with how the script checked everything imaginable. Could it find the tape drive, did the drive hold a tape, did the backup process have permission to read and write and rewind the tape, was the archive readable, and so on. After every command it checked $?, the exit status of the last command, to verify that it had really worked.

At the very end the script used echo to report how large the archive was and how long it took, from the very first tests through the final rewinding and ejection of the tape.

time-430625_640

No News Is Bad News

I asked how long the backup usually took now, as they were using a USB-3-connected tape unit.

“I don’t know, it’s always done by very early in the morning.”

But what does it say in the email it sends?

“Email?”

Uh-oh.

Let’s re-read the script. Look, once the script starts, there’s no way for it to exit without generating some output, unless it somehow crashes in the middle. The last three commands rewind the tape, eject the tape, and run one echo command. Since the tape is always rewound and ejected…

All output from a cron job is mailed to the job’s owner. This backup runs as root, so who gets that mail?

“Umm…”

Let’s look at /var/spool/mail/root. Wow. 12 megabytes. No one is reading this.

Six Months of Good Luck

They became root and ran the mail command, finding several hundred messages. Every morning around 0400 there was one with the output of jobs scheduled in /etc/cron.daily/*, rotating logs and checking for full file systems. Before that every morning, right at 0200 for the past six months, was a very short message reporting that the tape drive could not be found, then that it was rewinding and ejecting the tape, and that the whole process took just a few seconds.

What had happened?

In the process of adding and rearranging some hardware six months ago, the tape drive they thought they were using for backups had been renamed from /dev/st0 to /dev/st1. The test for the existence of the drive was hard-coded as /dev/st0, then through some complex but more general logic the script determined that the drive was actually /dev/st1 and it would have used that for making and verifying the backup. As it was, it only rewound and ejected that device.

For six months they were carefully labeling and rotating a collection of blank tapes.

Avoiding This Problem

Make sure that some responsible human gets all mail sent to root!

Put something like this in the file /etc/aliases, changing the destination appropriately:

root:    yourlogin@mailserver.example.com

Better yet, send it to at least two people so there’s already a backup when you’re not there. You probably don’t need the full domain name:

root:    yourlogin@mailserver,otherlogin@mailserver

Then run newaliases to update /etc/aliases.db. This solution works for both Sendmail and Postfix.

You could consider aliases served up via LDAP or NIS or SQL, but just make sure that it really works. See ldap_table(5), mysql_table(5), pgsql_table(5), etc. Specify what’s used for mail aliases in /etc/nsswitch.conf using the aliases tag. The GNU C library uses that file determine the sources from which to obtain name-service information.

Oh, what’s up with the “(5)”? That’s the standard way of indicating the relevant chapter entry within the manual, in case there is more than one. For example, passwd(1) explaining the command versus passwd(5) explaining the file in /etc.

Make Sure It All Works

Verify that you get output the next morning.

Then, randomly select one of your backups and make sure that you can restore the data stored on it.

Be careful, keep your data safe!

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.