Vicious Week at Nix Bits

Vicious Week

Closed Published by scotth June 12th, 2008 in Backup, SAN, Solaris

Oh Man, These past two weeks since getting back from Vegas have been brtual. I’ve had patches blow up in my face (partly because Solaris patching is still so 1995). I had a storage crunch on my NetWorker index store force me to borrow some space and mount it over NFS. Which worked great. Until the server housing the NFS mounts core dumped and spontaneously rebooted in the middle of the backup window, wiping out the indexes of the two largest Oracle database servers in the environment. Yeah, the indexes were back on tape, but it would (and did) take a while to get em back. And of course I had to find some more spare space. That spare space had to come off of local, unprotected disk again (except for tape) since we are out of SAN storage until some new hardware hits the floor. That means a month or so. In the middle of hurricane season, as usual. All of which has meant precious little sleep of late.

Oh, and I’ve had my problems with Legato, er, EMC NetWorker over the past few years, but none of the above was NetWorker bug-related and it handled all that mess far more gracefully than I would’ve expected. It failed and restarted backups automatically, and recoving indexes was a simple process. I’ve done some bootstrap recoveries for DR tests in the past, but this was the real deal. So hoorah for NetWorker for doing something right and not being the cause of sleepless nights (yes. 7.3.3 is very stable in our busy,big, and hopelessly complex environment).

And of course there has been the inevitable management foolishness. Most of the above didn’t affect anything outside of “my world” but as soon as some manager sees a status update, they try and correlate some technical issue they’re having to my event and I have to answer all this foolishness. Politely. With little or no sleep. Not to mention, more foolishness sending me out of town for what looks to me like pure politics and little technical merit, but maybe I’m missing something. Nah, probably not. it’s a dog and pony show. I wonder if I’m a dog, or a pony?

Anyway, if anybody out there has some really super secret vacation ideas they want to share, send em my way. My wife and I really haven’t been on a vacation alone since 2004, our honeymoon. And, apparently, travel is, like, really expensive right now.

Vegas a few weeks ago didn’t do much for me for the vacation side, as mentioned below. But as a technical/educational trip, it was pretty cool. I’m not much of an EMC fanboy. We use both Netapp and Sun/HDS storage here in addition to EMC. There are things about those platforms I like a lot better than the mish-mash of storage options EMC presents.

I attended a ton of technical sessions. There was a large VMWare presence there, which was pretty cool. We are deploying VMWare ESX, though I’m not directly involved in that activity. We handle the storage for ‘em though as well as the backups. To that end, I got really interested in Avamar for doing our VMWare guest backups. We did an eval here and it looks pretty good and it looks like we’re actually going to get it. I went to several sessions with that as well and feel a lot more comfortable with having stuck my neck out for this solution.

I’m sure there will be headaches in the deployment, but it should help. And I’ll be able to remove some complexity from our Networker deployment and free up some index space, by potentially removing 100 or more clients and putting them in Avamar. File servers, vmware guests, and some DMZ-resident hosts are all good candidates for that. Unlike a lot of EMC NetWorker customers, we aren’t initially interested in using NetWorker 7.4 with nominally-integrated Avamar. One of my goals is to reduce my complexity and index space in our huge NetWorker setup. So dividing and conquering actualy simplifies things for me rather than having one global namespace. I had some conversations with some EMC techies (and not sales techies) and they said that was unique in their experience.

But the reality for me is, even if the software can technically scale to 1000+ clients processing 100s of terabytes/week of backups (and DR copies), can it really be managed by humans at that point? At such a scale, can you even get a window to patch things, do upgrades, etc? And if you do have a problem, the pain you have and the pain that problem can cause can be quite big indeed. Obviously for me it already is with ~800 clients and about 70TB/week. (see paragraph one)
I’m still digesting material from the convention and looking forward to getting my hands on the presentation material. Like I said, it was pretty interesting stuff. If you’re, ya know, a geek.

Note: Yes. this is the most technical/geeky thing I’ve posted in years.

Scott Harney

(GPG key)
<>

Resume

An online copy of my resume (PDF)

Photo Album

My current pictures via Flickr.
Older family pictures.

Twitter Updates

Wedding

I got married on 9/4/2004. So click for details, already.

Old stuff

Links and writings from older versions of this site
Old stuff
Oldest stuff

Free DNS