Divided We Conquer

Published Date
01 - Jul - 2006
| Last Updated
01 - Jul - 2006
Divided We Conquer
About as mystical and "ooh"-provoking as the Matrix itself is this crazy new concept called The Grid. Still being researched by universities and big corporations alike, Grid Computing will change the way the network works. Soon, we will be able to harness the processing power of every computer on the Grid to make complex processing the matter of a jiffy. This, however, is a scenario we'll have to wait for - for now.

And then there's the case for Grid Storage. The term is quite suggestive - grid storage is when you don't have one dedicated file server for your LAN; rather, the task of storing data can be extended to nearly every computer on your network.

The Grid And You
So what use could you possibly derive from a storage grid in your office? Well, for one thing, you won't need to invest in a powerful built-for-serving-files machine as a file server - you could actually make do with a few half-decent PCs that nobody is using any more.

More important, though, is the question of failure. When your central file server decides to pack up, you and your team will most likely spend the rest of the day twiddling your thumbs or playing table-tennis, generally not doing much to earn their worth. However, a storage grid is quite like having a RAID (Redundant Array of Independent Devices) setup over a LAN - if one unit goes poof, there are always the others to keep your data available.

Still not convinced? With a storage grid, you can squeeze the most out of old hardware even after it's time to upgrade the rest of your PCs - just add it to the grid and go!

By now, you're probably curious to know how exactly you can turn your office LAN into a storage grid, so here goes.

WebDAV stands for Web-based Distributed Authoring and Versioning. When Tim Berners-Lee conceptualised the Internet, one of his intentions was to make it a writable medium as well - basically, you should be able to use the Internet the way you would use your own hard disk. The WebDAV protocol is an extension of the HTTP protocol for data transfer over the Web. Nearly all operating systems come with inherent support for WebDAV, letting you use files on the Internet as if they were stored right on your PC. 

GridBlocks DISK
GridBlocks is an open source project by the Helsinki Institute of Physics. GridBlocks DISK (Distributed Inexpensive Storage with K-Availability) is one of the three applications they are working on - its purpose is quite obvious from the acronym. Unlike solutions such as SCSI or SATA RAIDs that would set you back a pretty penny, GridBlocks DISK (GB-DISK) will make full use of the resources you give it, with a near-zero investment.
GB-DISK ships as Java .jar files, and is thus completely platform-independent. So even if your grid is a mix of Windows and Linux servers, you should face no problem at all. If you wish to try out GB-DISK, you can find it on this month's DVD.

How GB-DISK Works
The GB-DISK architecture consists of two main components - the Front End service (FE) and the Storage Element service (SE). Users will interact with the FE, which will break the file into "stripes" and spread them to SEs on the network. It also saves the metadata - data about the files - to the ~Infosystem~, where it is accessible to all GB-DISK services on the network.

The GridBlocks DISK architecture

Downloading files from the grid works in the opposite way. When a user requests a file using the FE, it uses the metadata to determine where it has stored the different fragments of the file, and begins to download them. Once all the fragments have arrived, the FE strings them together to form the original file.

All this, of course, isn't apparent to the user - all he or she will see is a plain old Web page or Web folder.
Firing It Up
Before you start using GridBlocks, you will need the Java Runtime Environment, which you will find on the ??? DVD?. Once you've installed that, start the Windows command prompt (Start > Run > "cmd") and navigate to the folder you extracted the GB-DISK files to. To start the Front End service, use this command:

java -jar gb-disk-fe-0.8.1.jar

The Front End, as seen through your web browser

What happens to your files in the Storage Elements

Once you've started up a Front End, you need to complement it with at least one Storage Element. You can start the SE service on the same PC as well, making it take part in the actual storage. To start the SE, use this command:
java -jar gb-disk-se-0.8.1.jar

You will want to set up multiple Storage Elements - certainly more than Front Ends - this way, you can have some machines dedicated to the task of storage, rather than having to manage storage ~and~ the interface.

Once you're done setting up Front Ends and Storage Elements, you can start using the grid. You can either use a browser to get your work done, or set up a Web folder. To use your browser, just navigate to the following address:
https://[IP Address of Front End]:8080/gb-disk

Using this interface, you can now choose the files you want to upload to the grid. To see the fate of your files, you can use the above address and replace the IP address of the Front End with the IP address of a Storage Element. What you will see is a bunch of incoherent file names that actually represent the fragments of the files you uploaded through the Front End.

If you would prefer not to use the browser and have it look more like the familiar Windows folders, you can set up a Web Folder - GridBlocks supports the WebDAV protocol (see box ~WebDAV~) and will let you do this. Just open My Network Places and select Add Network Place. When asked where you would like this network place to be created, select "Choose another Network Location" and click Next. In the Internet address field that comes up, use the aforementioned address. You can now view the contents of the storage grid in a transparent folder-like environment.

Setting up the Web Folder

Ah, the comfy world of Explorer-like folders

Finally, if you're feeling particularly geeky, you can use the command line to manage files on the grid. Using the command prompt, navigate to the folder with the GridBlocks .jar files and use the following command:
java -jar gb-disk-0.8.1.jar [put/get/delete/exists] [file name] [FE URL]

Adding more machines to the grid is as easy as can be - just launch the SE service. Launching the Storage Element service automatically sends out a message to all the other machines, effectively saying, "Hey, I'm here to share the load too!"

Nearly There
With all this capability, and completely free to boot, you would probably want to embrace GB-DISK with open arms right about now. Be warned, though - it is not without its limitations. Firstly, support for subdirectories is shaky, and you will most likely encounter failure messages if you try uploading folders.

Secondly, while it does support authenticating users to have their own private folders (you will find a "private" folder through the front end), this feature is buggy at best and not recommended for use.

Finally, with a dearth of proper documentation, configuring GB-DISK can be quite the pain in the hindquarter area.

All this aside, if what you're looking for is a place to dump files where they will be available almost perpetually - even after failures of one or more machines - this is the solution for you.

On the bright side, the GridBlocks team seem to be working night and day to perfect their program. Scarcely had we finished bundling version 0.8 of the software on the DVD than they came up with version 0.9! You can track their progress at https://gridblocks.sourceforge.net.  

Team DigitTeam Digit

All of us are better than one of us.