Get a handle on Unstructured Data

May 8, 2008

One of the big topics in data management these days is Unstructured Data.  What is it?  Word documents, spreadsheets, video, images, email, and instant messaging are a few examples.  How does one harness the wealth of information contained in these non-standardized formats, IF you are trying to capitalize on your existing data management infrastructure?  Microsoft has attempted to answer this question with its upcoming release of SQL Server 2008 (SS2008). 

Due out later this year, SS2008 provides built-in support for Unstructured Data through the FILESTREAM functionality.  FILESTREAM combines the power of a relational database platform with the storage flexibility of a NTFS file system.  This is accomplished by storing references within the database to binary large object data (BLOBs) residing on the file system.  In this fashion, SS2008 manages access and interaction with the information, but is not responsible for the direct storage of it.  Unstructured Data can be accessed through typical Transact-SQL statements or via Win32 API calls.  FILESTREAM is a good option to consider when objects being stored are larger than 1 MB in size and is limited only by the volume size of the underlying file system.  If objects are <1 MB on average, you’ll get better performance by using the Varbinary(max) data type directly within the database.

From a security standpoint, FILESTREAM fits neatly into the database.  If a user has permission to query a table and column containing FILESTREAM data, they are able to access the Unstructured Data.  This access however does not carry forward at the file system level.  Only the account running the SQL Server service account has access to the files at the file system level.    

Is this only way to deal with Unstructured Data?  Of course not, but it is an option.  There are some limitations when using FILESTREAM with other SS2008 functionality.  Special consideration needs to be addressed when utilizing Database Snapshots, Mirroring, Replication, Log Shipping, and Clustering.

Continue to browse through other blogs on www.thefuturevalueofbusiness.com to see conversations on SharePoint 2007 and its role in taming Unstructured Data.

Dave

Sphere: Related Content

Comments

Got something to say?