桃子视频

Categories
Syndication

IO You an Explanation

This blog post is going to be relatively short and sweet as my expertise in the storage realm is limited. Yesterday I had the pleasure of learning some new and interesting things about that I thought was interesting and would share with you.

For聽 this month鈥檚 meme is hosted by Mike Walsh of (). For my post I don鈥檛 have a solution so much as a nugget of information to along I found interesting. I was speaking to a consultant yesterday about a few things and the topic of his experiences with Oracle DBAs-vs-SQL Server DBAs in terms of his experience with them in regards to storage (SAN) consultations. Clearly this perked my ears up and I asked him to explain. He went on to tell me that in his experience he鈥檚 seen that the Oracle DBAs he鈥檚 come across come across as rather paranoid and never believe anything he tells them despite showing them whitepapers direct from storage vendor on the matter. On this particular matter we were talking about how NetApp has a best-practice recommendation that seems rather contradictory and (rightfully so) the DBAs were skeptical and would continue asking the same question over and over again despite having it already answered鈥ver and over again. What鈥檚 that recommendation you ask? Well in NetApp world they have what are called Aggregates which are聽 nothing more multiple RAID groups. Here鈥檚 the excerpt from Wikipedia about it:

NetApp supports either , , or disk drives, which it groups into (Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks) groups of up to 28 (26 data disks plus 2 parity disks). Multiple RAID groups form an “aggregate”; and within aggregates Data ONTAP operating system sets up “flexible volumes” to actually store data that users can access. An alternative is “Traditional volumes” where one or more RAID groups form a single static volume. Flexible volumes offer the advantage that many of them can be created on a single aggregate and resized at any time. Smaller volumes can then share all of the spindles available to the underlying aggregate. Traditional volumes and aggregates can only be expanded, never contracted. However, Traditional volumes can (theoretically) handle slightly higher I/O throughput than flexible volumes (with the same number of spindles), as they do not have to go through an additional viritualisation layer to talk to the underlying disk.

Ok, so what鈥檚 so different about that? Well that鈥檚 not the part that鈥檚 interesting. What鈥檚 interesting is explicitly states:

For Oracle databases it is recommended that you pool all your disks into a single large aggregate and use FlexVol volumes for your database datafiles and logfiles as described below. This provides the benefit of much simpler administration, particularly for growing and reducing volume sizes without affecting performance. For more details on exact layout recommendations, refer to [2].

Now think about that for a minute. As a SQL Server DBA you鈥檙e probably having a mental breakdown as I did when first slapped with this one as they鈥檙e essentially telling you throw all your eggs in the same basket, its better for you. Well this is where our conversation got interesting as he started breaking down for me exactly how Aggregates worked, how NetApp鈥檚 algorithms function, and WHY this best practice exists and isn鈥檛 as bad as it appears at first glance. Apparently because of the way NetApp鈥檚 Aggregates work the more you expand your Aggregate (read also: add more disks) you鈥檙e actually helping improve performance as you鈥檙e adding more spindles to it and will help performance along. At this point of this post you storage guys are probably ready to tear me a new one as I may or may not be explaining this correctly/accurate to which I re-state, 鈥淚鈥檓 not a storage guy, I鈥檓 a DBA learning something new and attempting to relay this information as best as I understood it.鈥

Which brings me to the point of my post. As a DBA crazy things like a best practice recommendation that doesn鈥檛 make sense can and will come up in your career. Should you question them? Without a doubt! After all, it鈥檚 your bacon on the line after these guys are gone. The important part however is the learning. Ask questions, realize the differences between technologies and understand the how鈥檚 and whys. In this post I talked about NetApp鈥檚 solution but EMC works differently as well as has different terminology. It may not be your job to be a SAN admin but as a DBA I think its essential to understand all the technologies involved in your configuration and work with those responsible to come up with the best solution that works for you. There are plenty of resources out there to garner knowledge from, they鈥檙e just a quick 鈥

Follow the hashtag on Twitter to check out everyone’s posts.

Categories
Syndication

DBA Horror Stories

In light of the fact that this weekend is Halloween I figured this would be a great time to start this meme: Give us your best database/IT horror story to date.

I鈥檝e been fortunate to date as far as the databases I鈥檝e dealt with not having any crazy problems. For that I鈥檓 thankful. Given that fact my story is more of a general IT horror story. It was a dark and stormy night (actually it was clear, humid, hot day day but those don鈥檛 work as well for these). I woke up this fine morning to hear the call with the two words every IT pro dreads to hear: major outage. As I got into work, fueled up on coffee I got details of what happened that fateful morning.

Every month our operations staff does a generator load test wherein we switch from commercial power to generator power for testing. On this day, however, the generator felt saucy and fate gave us the finger. They threw the switch as they had done so many times before when 鈥渟omething happened鈥 and a major failure happened in the generator. Normally this wouldn鈥檛 be too bad as you can switch right back to commercial power but, nay, not this day. For some reason the switch was unable to cut back so our whole data center went down faster than Balloon Boy鈥檚 family credibility. Like over-caffeinated monkeys on speed everyone leapt to action to find out the extent of the affected systems and implement the appropriate DR plans. After some scrambling the picture looked bleak. Despite having an alternate data center it turned out some of the systems on that side relied on the SAN鈥n the datacenter鈥hat was now down and out. Awesome. Over the next few hours meetings were held to determine which systems needed to come back up, in what order (yes, I know, this should have already been established but as we soon discovered our DR plans were dated). Power was restored by noon and that鈥檚 when the real work began.

As we began bringing systems back online a flurry of disk checks and fixes began. Things slowly began shaping back to normal as everyone hunkered down and brought everything back up. But not all was well in Whoville. Ripping out a SAN from underneath servers is not the greatest thing to happen. To make matters really awesome we鈥檙e a heavy VMware shop and guess where our VMDK files are? Yeah鈥ell in the midst of the madness we lost 2 LUNs due to corruption. Couple this with the fact that some of those servers turned out not to be backed up and needless to say you have a recipe for pure FUN! The good news is we have a good staff of dedicated folks who stayed as long as it took to get as many systems back online and working again. By 2:00 am (the failure occurred around 7:00 am) we were 95% back up and running with no major losses of data. Over the next few weeks I got the pleasure of working the every living hell out of the restore feature of Arcserve as well as check and double-check servers were being backed up.

Moral of the story is:

Have an up-t0-date DR plan, you never know when disaster is going to strike. Jonathan Kehayias recently about this.

Time to do some tagging:

(since I mentioned him already)

aka MidnightDBA (let鈥檚 put that new netbook to work ;-D )

Happy Halloween everyone!