Next Generation of the Data Warehouse (DW 2.0)
April 10, 2008
I had coffee today with one of the authors of the new book, “DW 2.0: The Architecture for the Next Generation of Data Warehousing”, Derek Strauss. Derek worked with William (Bill) H. Inmon and Genia Neushloss to write the book and it will be available July of this year (’08). His company (Gavroshe) becomes the first independent consulting company certified by Bill Inmon as DW 2.0 Architects. We talked about why DW 2.0 and what makes it different and he had some really interesting comments.
First of all, there is that big statistic that Gartner has reported…“50% of Data Warehousing will have limited acceptance or be considered failures.” Derek commented that it’s frustrating to be in that area and to hear that statistic bantered around all the time. He challenges the premise…what is meant by the term ‘data warehouse’. After all, one man’s ODS is another’s datamart is another’s warehouse is another’s database is another’s xls sheet containing GL entries. So, by setting a standard (DW2.0), it sets the baseline; OK now we know what a data warehouse is.
Second, they wanted to bring corrections to the DW environment. There are real issues around the traditional notion of a data warehouse. We talked about two of them at length.
Data currency – lifecycle of Data. Guess what, data has a lifecycle. Who knew? Well some of us experienced folks have known this for a while – but within the DW2.0, this notion is has a separation of concerns (to steal a SOA term). Think about what happens over the years as data decays. The use of an attribute changes over time as the business’ competitive landscape changes and they respond. What happens to that data – do we declare a new attribute at some point, do we change it and push the new meaning upstream, do we just live with it? One of the key pillars (or sectors to use Inmon’s terminology) of DW 2.0 is that the lifecycle of data is embraced. Here is how it nets out… Refer to Figure 1 at the end of this entry.
- Interactive Sector- The place where high performance data warehouse processing occurs (Very current data – Transaction data)
- Integrated Sector - The place where integrated data resides (Current data – Relative to the business needs; hourly, daily, etc…)
- Near Line Sector - The place where data with a lower probability of access resides (Less than current data – Also relative to the business needs; weekly, monthly, quarterly, etc…)
- Archival Sector- The place where data with a truly low probability of access resides (Older data – We all know what goes here J)
Unstructured data – emails, documents, faxes, etc… Traditional DWs had zero notion of unstructured data. A good analogy is the communications that occur between two people. The experts say that 80% of that communication is non-verbal and only 20% consists of the actual words we say (fun topic for another time). The same is true with data. For example…a CRM promises to give you a 360 degree view of your customer. Great, except that they forgot about the communications that we had with the customer. It’s like saying that the complete view is made with only 20% of the data. The DW 2.0 architecture makes use of ‘probabilistic matching’ when looking at relationships between unstructured and unstructured data as well as between structured and unstructured data. Now you have a 360 degree view of you customer!
Enterprise Metadata Repository. We did not talk much about the metadata aspect because no one really likes talking about metadata J. But, threaded throughout the architecture is the notion of metadata that has enterprise reach and natural to the data warehouse – a tight coupling.
In looking at DW 2.0, it is exciting because of the reasons outlined above. It was a struggle on several levels to have ‘one warehouse’ with no notion of data lifecycle. On one level, it’s hard to treat data equally, when you really know that some stuff should be retired and some is volatile because it was just born. To know that the teenaged data that behaves badly is managed in the same place, the same way as the 20/30 something data that is more mature. Also, when dealing with the finance folks, it’s easier to speak their language. No need to request money for another ODS when we already have a data warehouse…rather, we need to address the data lifecycle…data depreciation – or planning ahead; data accrual! Once they understand that data is an assets and that assets depreciated or have a lifecycle, we can engage in conversations about how we need to manage this condition.
Of course, since DW 2.0 has separation of concern around currency – we can employ the technology that makes the best sense for that stage of the data (Oracle to handle the Interactive Sector, SQL Server to both the Integrated and Near Line Sectors, and Archiving on DB2 – just an example.)
Then leveraging emerging investments in MOSS becomes strategic to the warehouse (fodder for another day).
Lastly, since DW 2.0 has been trademarked, there will be no mistake about its composition. Only the original authors and architects will be able to make a change. Given the track record of these folks, I think that is a good idea. For more information on DW 2.0 visit http://inmoncif.com/home/.
In summary, DW 2.0 Advantages…
- Hold data at the lowest detail,
- Hold data to infinity (or at least to your retirement),
- Not cost huge amounts of money,
- Have integrity of data and still have online high-performance transaction processing,
- Link structured data and unstructured data,
- Tightly couple metadata to the data warehouse environment,
- Support different kinds of processing without sacrificing response time, and
- Support changes of data over time.
Take Care,
Scott Felten 2.0
Figure 1 - DW 2.0 Architecture
Sphere: Related Content
Comments
2 Responses to “Next Generation of the Data Warehouse (DW 2.0)”
Got something to say?





















…and remember…the faster you implement and solve my business problem, the happier I will be.
Signed…the voice of your customer
…oh yeah…and as your customer, I don’t care what’s under the hood - I just want a car. Just show me where to put the key, tell me how to keep it going, and when in doubt give me the number of a good technician that can fix it when it’s broke.
(For those a little unaware…that means…don’t teach your customers how to build the DW, they just want to use it when your finished.)