Wednesday, April 25, 2007

WAFL

This awesome paper was written almost 10 years ago:

http://www.usenix.org/events/osdi99/full_papers/hutchinson
/hutchinson_html/hutchinson.html

Its still pretty much the best explanation of WAFL internals (sans fluff) I've ever read. Read it the next time you wonder why snapshots on a NetApp are so darn fast.

Monday, April 23, 2007

More Storage Networking trends

So we discussed global namespace and the different ways in which we can implement it. What other trends are there in the storage networking world?

I'm going to take a stab here:
1. Backup: Everybody hates it, but you gotta have it. Disk capacities have grown exponentially, but tape speeds haven't kept pace. So how do you backup your 20TB data center without bringing the production network down, or spending tons of money on some super-expensive solution?
2. Disaster Recovery: Given the times we live in, its a big deal. DR is not just about a DC going away, but its about individual filers going down as well. How do you drive down DR costs? While making sure that the DR site comes back up within a reasonable amount of time?
3. Data proliferation: We stash ridiculous amounts of data on our disk drives. How does the person in IT sort out the junk from the useful stuff? OTOH, given the low cost of storage, is this even a problem for IT?
4. Server proliferation: Walk into any data center today (mid-size onwards) and there are literally hundreds of servers and devices. This is getting out of control. Talk with any data center facilities manager and they will tell you EDC power consumption is going through the roof. I don't think our state's grid can handle it. We've heard of 365 Main's power outage, when both utility power and backup power failed. VMWare/Xen are one solution, but I think more needs to be done.

Monday, April 2, 2007

Global namespace revisited

We defined the problem earlier as IT folks not being able to "glue" their storage together. In a very general way, I think this is what people mean when they say "global namespace". How do you take a new storage appliance or server, and make it appear like its a part of the old namespace? This is a tough technical problem to solve.

There seem to be 2 approaches to solve this problem:
1. Place the solution outside the storage server. This is the approach taken by Nuview (since acquired by Brocade), Rainfinity (acquired by EMC), Neopath (acquired by Cisco) -- notice a pattern here -- and Acopia, which is still independent. Attune Systems is another company which takes a similar approach, but for Microsoft environments only.

The idea here is to stick a virtualization layer between the client and the server, and have that layer provide a namespace that spans 2 or more fileservers. The advantage to this approach is that you don't need to change your storage servers. If you like working with your NetApp, EMC, SUN or Windows 2003 (hey it could happen) server, you can continue doing so. The disadvantage is that you have another network layer essentially to worry about. If all your users are going through your virtualization layer and that box goes down, what do you do? Brocade/Nuview solves this problem by surfing on top of DFS. The other guys (don't know about Attune) cluster their products together.

2. Place the solution inside the storage server. There are many innovative companies providing products here. One is Isilon. With their solution, you buy a storage array, when you run out of space on it you add another one. Everything automagically falls into place and you just increased your storage transparently. Another is Ibrix, which has an absolutely fascinating solution that you have to check out. The idea here is to really load-balance your storage so it can scale linearly, on the hardware of your choice. The grandaddy of all of these is Spinnaker, which was acquired by NetApp back in 2003, and has been reborn as Data OnTap GX.

At this point though, we get into the world of clustered filesystems. An interesting approach to clustered filesystems is from SGI. CXFS involves having a separate metadata server, which keeps track of which file is where, and distributes traffic to the appropriate members of the cluster that run XFS. This is not a simple solution: SGI sells you a warm body to install and deploy the metadata server for you, should you choose to go with this solution.

The advantage here with whichever solution you go with, is that its going to be integrated. There is no virtualization shim that you stick between the client and the server. Obviously the disadvantage is that you have a brand new OS your IT folks have to learn.

So which approach is going to win? The market will tell, obviously.

Half Moon Bay pictures

For a quick break from technical stuff, here is a picture of sea anemones that I took some weekends ago. There is a big one on the top, and a whole bunch of tiny ones on the mound at the bottom of the picture.



Here is a starfish.

Sunday, April 1, 2007

Global namespace problem

This is one of those buzz-words like virtualization that everybody loves. And loves to define in their own way. Lets first try and understand the problem.

When you buy a storage server or appliance from a vendor, you're pretty much limited by the amount of disk you can attach to it. These days a mid-tier system can start with a capacity of 4TB and grow to 252TB. This is great, but how long before you run out of horsepower on your system CPU? So you keep serving out data on the single system until the CPU is pegged, then buy a new storage system. Worse, you might have started with the low end model, and run out of storage capacity at some lower number.

Cool. Now you tell all your users, hey your old files are on server1:/project, new stuff is on server2:/project. This can quickly spiral out of control. Actually because of volume size limitations, its probably going to be server1:/project1, server1:/project2, server2:/project and so on.

So how do you solve this problem?

Choices, choices, choices

My employer, Neopath Networks, got acquired by Cisco systems two weeks ago. It was a fun ride while it lasted, almost 3 years. I got the "package", which means I'm looking for a job. I got to work with some great folks and will miss them.

Its an interesting time in the storage industry. IT folks seem to be more open to buying new gear, and there are some real problems that folks have in the data center. In this blog I'd like to discuss some of them.

I will be moving to Half Moon Bay very soon, and will post pictures and information about the city and the coast as well.