Maintenance Windows
Maybe I’m getting old, but I’m discovering that I have less and less patience for working maintenance windows. For those of you not living la vida IT, a maintenance window is a scheduled window of time wherein usually-available bits of hardware are worked upon by a staff of IT monkeys. This generally means that a server or a network or an application is unavailable to end-users, which is why most maintenance windows get scheduled in the wee hours of the morning, so as to avoid needlessly complicating end-users lives. Of course, when you work with a small group with a large number of servers to manage (think 6 people : 350 servers) and you’re not staffed 24×7 (except for a weekly pager rotation), these crazy-hour maintenance windows can become quite the drag. For example: you start an all-hands-on-deck maintenance at 10:00pm on a Sunday night and finish up around 4:30am Monday morning. The question then becomes: who gets to come in at 9:00am, just a few hours later, in order to provide some sort of coverage for the office-end of things? Luckily, one of the guys had some plans for Monday evening and wanted to get out early, so he chose to get little sleep, handle the morning duties and leave by 2:00pm.
As for me, this is my third six-day week in a row, and the fourth after-hours maintenance window I’ve worked this month. None of them were the happy, 30 minute reboot-and-run windows that we all enjoy and were, instead, the 4 to 8 hour update, patch, reboot twice, wait for ping and pray kind of windows. Yes, I do get paid a stipend to work these windows but the extra cash generally gets destroyed by taxes and the lack of sleep and time away from the office is beginning to wear me down. I’d like to say that I lead an exciting existence outside of work, which I miss during these long weeks, but that’s not exactly true. I just hate dreading Sundays.
It doesn’t help that my constant cries for logic and common sense are always ignored in favor of “trying to get as much done in as short a time as possible, and damn your sanity”. If we’ve got to reboot every server in our domain, twice, with only 6 people at the reins with only 5 hours in the window and every minute of it spoken for, does it make sense to have one guy decide to lock-down a bunch of web servers he hasn’t had a chance to get to in recent weeks, or for us to test (not apply, but test - we’re not QA, mind you) a roll-back of applications that have been crashing since last week’s patch, or to dick around with non-crucial Word settings on a server that processes documents? I say no, but I’m judged obstinate and lazy (as per usual).
Of course, when 3:00am rolls around, everyone’s eyes are burning red from exhaustion and the guy who decided to lock down a bunch of servers hasn’t even started booting his remote servers and we all have to pick up his slack and extend the window, am I vindicated? No. It’s at that point that the guy who decided to add non-emergency work (for his own personal project, I might add) to a tightly-scheduled, overloaded maintenance window is regaled as “going the extra mile” and picking up his slack is all a part of “teamwork”. Yeah. Just make sure you don’t ask why he didn’t take the opportunity to come in during the maintenance the week prior for a different maintenance (and one with some wiggle room) on the same system as the servers he’s locking down during this window (with no wiggle room). He was on vacation, you see, so everyone else has to pay the price for his not being available or tasking anyone else with the lock-down while he was out. Asking questions is tantamount to corporate espionage, it would seem.
You know, most of this ranting is just general frustration. I like the guys I work with, and they would all bend over backwards to help me out (and have), so I don’t have a problem helping them out in return. It’s just the lack of logic and the refusal to consider alternatives that drives me nuts. We’ve got customers world-wide, so no matter when we set our maintenance windows, we’re going to piss off someone, somewhere. When we have to work a 5 hour window and still get people into the office to hold down the fort in the morning, wouldn’t it make more sense to run the maintenance from 7:00pm to midnight? I think so (and we reserve the right to do so in our customer documentation), but the bosses refuse.
You know the most entertaining part of this fiasco? For the first time, my manager was the on-call engineer during one of our big maintenances. He finally got to experience first-hand what it’s like to work extremely late and then be unable to get any sleep because the pager continually goes off (we average over 100 pages a day). It was especially busy since we’d rebooted every member of the domain, and some of the applications hosted on these servers don’t like to reconnect and function properly. Of course, since he’s the boss he can forward all the pages to the cell phone of the guy who volunteered to come in early and turn off the pager so he could get some sleep. Since I’d originally brought up the problem of being on-call and working long maintenance windows but was shot down as a “complainer”, I am sorely tempted to say “I told you so”, but I’ll restrain myself to just posting on this blog instead.

