“Oops”

System administration isn’t always easy.

I just read some horror stories from Unix administration, one here and another one here. And I so much share the grief and agony of the administrators who accidentially typed in the right command, in the wrong place, and wiped out the accumulated work of 100 users in one second.

You know, you sit there in your chair, typing on your keyboard, going through routine maintenance of a server, and suddenly you spot something odd. Like, “what’s that file doing there?” And you don’t see any reason for it to be there, so you delete it. Or, maybe, you’re just going to wipe out some temporary files with the “rm -rf temp*” command and accidentally put a space between the temp and *, so the command deletes not every file starting with “temp”, but rather the file “temp” and every file that matches “*”, which is… all files. In every directory.

In that way, Unix is marvellous. It adheres to a radically different concept that Windows. Windows and it’s predecessors coined the expression WYSIWYG: What You See Is What You Get. Unix is more sinister, it adheres to YAFIYGI: You Asked For It, You Got It. No warnings, no “do you really want to delete all files”. It just says “yes, sir” and does it. With brutal efficiency.

There are two common factors in all of these administration mistakes. One, the fact that as systems administrator, you’re not working on your own computer, typically: You’re working on servers, which provide critical services for tens, hundreds, or even thousands of users. What you do there affects more people than you normally want to think about. Two, that particular feeling of a sudden coldness that starts way down in your spine and slowly reaches your neck. By the time you realize what you’ve done, your face turns deep red and a cold sweat breaks out on your forehead, as you mutedly glance on the fateful letters you just typed on the screen. Or, as the story above tells it:

When I arrived at his office, his door was ajar, and within ten seconds I realised what the problem was. James, our manager, was sat down, head in hands, hands between knees, as one whose world has just come to an end. Our newly-appointed system programmer, Neil, was beside him, gazing listlessly at the screen of his terminal.

I don’t have any particular recollection of such sudden mistakes, but I know I’ve made similar ones: I know the feeling all too well. I know there was one time when I printed 20.000 bank account statements and got the check digit on all account numbers wrong, which caused the transaction department to be swamped in calls from worried people during the next few days. Or when we accidentially wiped out all invoices for all customers on the 28th of every month – instead of just the old expired ones.

After ten or fifteen years in the industry, I like to think I’m a little bit more careful. But the other truth of that is that I actually haven’t been involved in systems administration for a while now. In any way, more public services than you want to think about depend on people like that, who think “oh, this should work” only to realize – a second or two too late – certain unforeseen side-effects of that particular command.

The next time you hear about a nationwide outage of the ATM services that lasted for a few hours, there probably, somewhere, up in some office in Stockholm, sat a guy completely frozen at his keyboard, staringly blankly at a single, blinking cursor on an otherwise empty screen.

Oops.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>