Business Intelligence Symposium VI Focuses on Innovative Analytics

October 17, 2011

LÛCRUM Inc., in conjunction with the University of Cincinnati’s Carl H. Lindner College of Business, has announced that their 6th Business Intelligence Symposium is scheduled for December 13, 2011, at the Carl H. Lindner College of Business focusing on Innovative Analytics. LÛCRUM Inc. and the Carl H. Lindner College of Business have been organizing these events for business executives and IT professionals that use and need analytical data to provide a forum for sharing ideas, stories, experiences and business cards.

Confirmed speakers for the event are:


Drew Boyd

Drew Boyd is a recognized authority, thought leader, educator, and practitioner in the fields of innovation, persuasion, and social media. He is the Executive Director of the Master of Science in Marketing Program and Assistant Professor of Marketing and Innovation at the University of Cincinnati.

Ginny Walker

Virginia (Ginny) Walker is currently GE Aviation’s Chief Enterprise Architect, reporting to the CIO. She grew up in Michigan, obtained her undergraduate degree in Systems Analysis at Miami (OH) University, an MBA from Xavier (OH), and has worked 25 years for GE in Aviation, Corporate and Energy. She has held a diverse span of Information Technology roles…from application development, to data warehouse and data management, to strategic planning, outsourcing, data center management, and ERP implementations.

Jim Goetz

Jim is the Chief Information Officer at Convergys.  He is responsible for the planning, development, and delivery of the technology products and services for the Company globally, including external and internal clients. He reports to Jeff Fox, president and chief executive officer of Convergys.

Mike Cholak

Mike Cholak is the Vice President of Customer Intelligence Services at Convergys.  He leads a team dedicated to delivering a full suite of consulting services to the Company’s clients that help leverage customer intelligence and feedback to optimize long- term customer loyalty and profitability.

For more information:

 

 

 

 

 

 

 

 

 

 

 

 

Data Vault: The Preferred “flavor” for DW Architecture in BI – Part II

October 7, 2011

In Part-I, I explained the place of Data Vault (DV) in Enterprise Data Warehouse Architecture. Now let’s look at different DV entities, rules for each entity and why Dan Lindstedt calls DV a “hybrid” approach. This minimal understanding is necessary before diving into the differences between the various modeling techniques.

The main entities of Data Vault are Hub, Link and Satellite.

HUB Entity (HUB_): This is a defining entity. It contains a unique list of business keys. These are the keys that businesses utilize in everyday operations. For example, employee number, SSN number, Product Code. So the attributes of HUB are:

  • Surrogate Key – This is a Primary Key of hub and holds 1-to-1 relationship with the Business Keys.
  • Business Key – This is a Primary Key of the source system. This can be a composite key. ETL checks this key’s existence in the hub table and inserts one if it doesn’t exist.
  • Load Date Time – The datetime of the key / record when it was first loaded into the table.
  • Record Source – The name of the source the record originated from. This is useful for data traceability.
  • Record Begin Date Time – The datetime when the record became active in the source (if available) or the datetime when ETL has been run.
  • Record End Date Time – The datetime when the record is closed. This can only be detected if the logical deletes are supplied or derived in some manner.

LINK Entity (LINK_): LINKS are constructed once all the HUBS are identified. Links are relationship entities.  These are the physical representation of m-to-m 3NF relationship. It represents the relationship or transaction between hubs. The link table contains the unique list of relationships between hub keys. When a relationship arrives, it simply gets loaded into the table if doesn’t exist. Typically, the link tables translate into fact tables in the datamart access layer. For example, the link between employee number and the project number. The other attributes of LINK are:

  • Surrogate Key – This is a Primary Key of the table and is useful when a link contains more than two hub keys as composite key might cause performance problems. This is also
    useful when the granularity of the link changes (a hub key is added) or history needs to be maintained on the relationships.
  • Hub Key 1 to Hub Key N – The surrogate keys from the hub tables that are involved in the relationship.
  • Load Date Time- The datetime when the record was loaded into the table.
  • Record Source – The source system name from where the record or relationship was loaded from.

SAT Entity (SAT_): SATS holds descriptive information about the hub keys or the relationships. The satellite is most closely resembles Type 2 Dimension. When the data changes, a delta record is inserted into the table and if the certain columns changes faster than others then these can be split into two different tables to avoid data replication. For example, employee details such as employee name, address, phone number, email address in the satellite off of hub  or time spent by an employee on a certain project in satellite off of LINK that stores the relationship between an employees and projects. The other attributes of SAT are:

  • Hub or Link Surrogate Key from HUB or LINK table. This is part of the primary key.
  • Load Date Time – The datetime when the record was inserted into the table. This is part of the primary key.
  • Surrogate Key – This is optional. It is useful when satellites have multiple values such as multiple home addresses.
  • Record Source – The name of the source.
  • Record Begin Date Time – The datetime when the record became active in the source (if known) or the datetime when ETL has been run.
  • Record End Date Time – The datetime when the record is closed.

And stand-alone tables such as calendars, time, code and description tables may be used.

 

Below is a snippet of a Data Vault Model housing borrowers who have taken out Student Loans:

 

 

Modeling Rules for Each Part of the Entity:

FOR HUBS:

  • Hubs keys cannot migrate into other hubs (no parent/child like HUBS).
  • Hubs must be connected through links.
  • More than two hubs can be connected through links.
  • Surrogate keys may be used.
  • Business keys are 1 to 1 relationship with surrogate keys.
  • Hubs primary keys always migrate outward.
  • Hub business keys and primary keys never change.
  • If a hub has two or more satellites, then a point-in-time table can be built for ease of joins.
  • An ‘UNKNOWN’ business key record can be inserted into Hub that can be used to tie other data in links and sats that has no business keys in source. This kind of data is usually a bad/incomplete source data.

FOR LINKS:

  • Links can be connected to other links.
  • Links must have atleast two hubs associated with them in order to instantiated.
  • Surrogate keys may be used.
  • The combination of surrogate business keys made a unique key.
  • Does not contain descriptive data.
  • Does not contain begin and end dates.

FOR SATS:

  • Satellites may be connected to hubs or links.
  • Have 1 and only 1 parent table.
  • Satellites always contain either a load date-time stamp, or a numeric reference to a stand-alone load date-time sequence table.
  • Primary key is a combination of ‘surrogate key’ from either hub or link and the load datetime stamp.
  • Surrogate keys may not be used.
  • Must have a Load End Date to indicate when the CHANGE to the data set has occurred.
  • Satellites are always delta driven. Duplicate rows should not appear.
  • Data is separated into satellite structures based on 1) type of information 2) rate of change.

DV model utilize bits of both 3rd Normal Form and Dimension Modeling concepts.  This approach has made the model simple, flexible, expandable, adaptable and consistent.

  • Adapted many-to-many physical relationship structure from 3NF that became a LINK table.
  • The LINK table is also similar to factless fact in Start Schema.
  • Adapted the notion of 1 to 1 (business key to surrogate key) tracking from dimensional modeling (type 1 dimension).
  • Adapted the notion of “data over time in a separate table/structure” from dimensional modeling (type 2 dimension). This resulted in a SAT table however it is fundamentally
    different, in that it is a child dependent table, whereas the dimension is a parent table to the facts.

This is it for now. In next post(s) we will look into some examples which will show how Data Vault technique overcomes the limitations of 3NF and Dimensional Model structures when applied as an Enterprise Data Warehouse.

- Jyothi

 

Source: tdan.com, danlinstedt.com

IT Resources in 2011

October 5, 2011

iPhone4S

iPhone4S

Like most of you, I recieve a lot of publications in my inbox each day.  This morning, an interview in Intelligent Utilitycaught my attention.  The interview, IT’s Workforce of the Future, is a very interesting observation by a CIO who has been in the workforce for less than 20 years.  In the interview the CIO, Branndon Kelley reminds us that in 1999 our workplace had better, faster technology than we had at home.  Today, for most, that simply isn’t the case.  We have a new workforce emerging that has been immersed in technology since birth and want to bring that to their new post-college positions.

Branndon issued a challenge for those of us “old timers” in IT.  Certainly our experience has taught us to be cautious and deliberate and to plan our project carefully.  These new techies, however, are more likely to jump in with both feet and more optimism.  Neither is certainly a perfect strategy but their is a lot to be learned from both.  Business is changing faster and faster every day.  When evaluating your projects here are some things you may want to consider:

  1. Is there truly a compelling business reason why I couldn’t move my data to the Cloud?
  2. How can I inspire my technical team to think more creatively and collaboratively?
  3. Why does my staff need a permanent desk at my site? 
  4. Can I find the best resources in my town/headquarters?  What if I was not limited by geography?
  5. What tools do I need to get in place now so that I can enable my teams to push the boundaries of location, work hours, and technology?

Food for thought indeed.  How would you respond to the questions above?

 - Jodie

 

Pervasive BI Enabled – Data Vault

October 2, 2011

There were a few interesting posts this past week as it relates to the Data Vault Methodology and emerging trends in Business Intelligence (BI). These are must reads for BI practitioners who wish to stay current on where the industry is heading as well as how and why things should be done in a certain way or method. Again, as Dan Linstedt, the inventor of the methodology is known to say, the data vault was created to solve specifically the Enterprise Data Warehouse (EDW) problem.

The first entry is the slide deck from the recent Advanced Architecture Conference in Denver: http://danlinstedt.com/datavaultcat/datavault-advanced-architecture-conference-slides/

There is a wealth of information contained herein that outlines at a high level the benefits and reasons why the data vault method is enabling pervasive BI. It goes into details about comparing 3rd normal form and the star schema form to the value the data vault brings to the table. It outlines the components of a data vault, including the hubs, links, satellites, PIT tables, and bridge tables.

It was nice to see the connection with the methodology to emerging tools. There are now more than a handful of tools that are undertaking the task of automating the ETL and loading processes of an EDW. The method of the data vault allows for a lot of the technical innovation. Dan presents the case well that businesses today want answers faster and cheaper today as well as having the system flexible enough to evolve with the business over time. Most EDW systems today fall short of being pervasive, and often that is a methodology problem that appears to have an answer.

In addition to these topics, Dan pushed forth the proposition of an “Operational Data Vault” and began to lay out how this connects to Business Intelligence. This section was very interesting and helped to turn on a few light bulbs and progressive thoughts. I love the idea of pushing the intelligence as far as possible while still keeping the style and benefits. I agree with Dan that this is still so new that there are not yet any vendor applications or tools that can help with this…yet.

The presentation does leave a little bit of gap with the more advanced concepts, such as change management and how to leverage a data vault using agile principles. It is also a bit light on how to drive a project plan as well as what to watch out for while attempting to build a data vault style EDW. All in all, he presents a very compelling case as to why the data vault should be strongly considered when delivering Business Intelligence.

The second entry is a public rebuttal to a challenge about the validity of the data vault methodology:
http://danlinstedt.com/datavaultcat/datavault-new-response-to-frank-habers-part-1/

As Dan states, he felt the need to lay out the facts and correct misconceptions from the original author. After you get past the first few sections of disclaimers and background, the article does a great job of comparing and contrasting methodologies and technical styles for building the EDW.

I often see HDSA (persistent historical data storage area) implemented at client sites and they sometimes go by different names (ODS, staging, etc.). I have yet to see a better discussion of the merits of a data vault compared to the HDSA as he presents here. If you are in an environment that has an ODS or persistent history area, please take a moment to read this section, because the warnings and issues are real and one should not make decisions here lightly because of the long term ramifications.

Dan also goes into great detail to itemize the benefits and value proposition of why you take the steps to have a data vault at the core of the EDW.

I would say that the overall tone of the article still makes it a bit hard to read, but if you can overlook the defensive stance, you will find many gems that will help solidify and justify the purpose and value of the data vault compared to any other technique.