Data Vault: The Preferred “flavor” for DW Architecture in BI – Part II
October 7, 2011
In Part-I, I explained the place of Data Vault (DV) in Enterprise Data Warehouse Architecture. Now let’s look at different DV entities, rules for each entity and why Dan Lindstedt calls DV a “hybrid” approach. This minimal understanding is necessary before diving into the differences between the various modeling techniques.
The main entities of Data Vault are Hub, Link and Satellite.
HUB Entity (HUB_): This is a defining entity. It contains a unique list of business keys. These are the keys that businesses utilize in everyday operations. For example, employee number, SSN number, Product Code. So the attributes of HUB are:
- Surrogate Key – This is a Primary Key of hub and holds 1-to-1 relationship with the Business Keys.
- Business Key – This is a Primary Key of the source system. This can be a composite key. ETL checks this key’s existence in the hub table and inserts one if it doesn’t exist.
- Load Date Time – The datetime of the key / record when it was first loaded into the table.
- Record Source – The name of the source the record originated from. This is useful for data traceability.
- Record Begin Date Time – The datetime when the record became active in the source (if available) or the datetime when ETL has been run.
- Record End Date Time – The datetime when the record is closed. This can only be detected if the logical deletes are supplied or derived in some manner.
LINK Entity (LINK_): LINKS are constructed once all the HUBS are identified. Links are relationship entities. These are the physical representation of m-to-m 3NF relationship. It represents the relationship or transaction between hubs. The link table contains the unique list of relationships between hub keys. When a relationship arrives, it simply gets loaded into the table if doesn’t exist. Typically, the link tables translate into fact tables in the datamart access layer. For example, the link between employee number and the project number. The other attributes of LINK are:
- Surrogate Key – This is a Primary Key of the table and is useful when a link contains more than two hub keys as composite key might cause performance problems. This is also
useful when the granularity of the link changes (a hub key is added) or history needs to be maintained on the relationships. - Hub Key 1 to Hub Key N – The surrogate keys from the hub tables that are involved in the relationship.
- Load Date Time- The datetime when the record was loaded into the table.
- Record Source – The source system name from where the record or relationship was loaded from.
SAT Entity (SAT_): SATS holds descriptive information about the hub keys or the relationships. The satellite is most closely resembles Type 2 Dimension. When the data changes, a delta record is inserted into the table and if the certain columns changes faster than others then these can be split into two different tables to avoid data replication. For example, employee details such as employee name, address, phone number, email address in the satellite off of hub or time spent by an employee on a certain project in satellite off of LINK that stores the relationship between an employees and projects. The other attributes of SAT are:
- Hub or Link Surrogate Key from HUB or LINK table. This is part of the primary key.
- Load Date Time – The datetime when the record was inserted into the table. This is part of the primary key.
- Surrogate Key – This is optional. It is useful when satellites have multiple values such as multiple home addresses.
- Record Source – The name of the source.
- Record Begin Date Time – The datetime when the record became active in the source (if known) or the datetime when ETL has been run.
- Record End Date Time – The datetime when the record is closed.
And stand-alone tables such as calendars, time, code and description tables may be used.
Below is a snippet of a Data Vault Model housing borrowers who have taken out Student Loans:
Modeling Rules for Each Part of the Entity:
FOR HUBS:
- Hubs keys cannot migrate into other hubs (no parent/child like HUBS).
- Hubs must be connected through links.
- More than two hubs can be connected through links.
- Surrogate keys may be used.
- Business keys are 1 to 1 relationship with surrogate keys.
- Hubs primary keys always migrate outward.
- Hub business keys and primary keys never change.
- If a hub has two or more satellites, then a point-in-time table can be built for ease of joins.
- An ‘UNKNOWN’ business key record can be inserted into Hub that can be used to tie other data in links and sats that has no business keys in source. This kind of data is usually a bad/incomplete source data.
FOR LINKS:
- Links can be connected to other links.
- Links must have atleast two hubs associated with them in order to instantiated.
- Surrogate keys may be used.
- The combination of surrogate business keys made a unique key.
- Does not contain descriptive data.
- Does not contain begin and end dates.
FOR SATS:
- Satellites may be connected to hubs or links.
- Have 1 and only 1 parent table.
- Satellites always contain either a load date-time stamp, or a numeric reference to a stand-alone load date-time sequence table.
- Primary key is a combination of ‘surrogate key’ from either hub or link and the load datetime stamp.
- Surrogate keys may not be used.
- Must have a Load End Date to indicate when the CHANGE to the data set has occurred.
- Satellites are always delta driven. Duplicate rows should not appear.
- Data is separated into satellite structures based on 1) type of information 2) rate of change.
DV model utilize bits of both 3rd Normal Form and Dimension Modeling concepts. This approach has made the model simple, flexible, expandable, adaptable and consistent.
- Adapted many-to-many physical relationship structure from 3NF that became a LINK table.
- The LINK table is also similar to factless fact in Start Schema.
- Adapted the notion of 1 to 1 (business key to surrogate key) tracking from dimensional modeling (type 1 dimension).
- Adapted the notion of “data over time in a separate table/structure” from dimensional modeling (type 2 dimension). This resulted in a SAT table however it is fundamentally
different, in that it is a child dependent table, whereas the dimension is a parent table to the facts.
This is it for now. In next post(s) we will look into some examples which will show how Data Vault technique overcomes the limitations of 3NF and Dimensional Model structures when applied as an Enterprise Data Warehouse.
- Jyothi
Source: tdan.com, danlinstedt.com
Data Vault: The Preferred “Flavor” for DW Architecture in BI – Part I
September 21, 2011
Business Intelligence (BI) is todays ‘MANTRA’ chanted by almost every business. Companies want to outsmart the competition. Companies are ready to invest big bucks and human power to build a sophisticated BI system so that they can have the knowledge that others don’t and seize on the opportunities in the market before others do. BI shows the Future Value of Your Business.
BI systems need DATA and every business has terabytes of real data which can provide them with the information and knowledge they need to make the right decisions on time. But the key is to turn that data into information in a timely, efficient and effective manner once the WHAT AND WHY questions are answered i.e., what information is needed, what matters and why that is required. In today’s market, every business is in a RACE. The race to conquer others. The race to generate more gains/profits. The race to foresee the risks early on so that they can be avoided. So time is of the essence here.
An optimized BI system integrates large volume of external and internal near real time data to allow management to create opportunities by making intelligent decisions after performing predictive analysis of their approach on the business. A good BI System is like a GPS. An effective GPS is one that not only shows you a route to your destination but also guides you when you hit roadblock, gives up-to-date external conditions (constructions / traffic) information, provides multiple routes to choose from, suggests you with alternatives for shorter and fastest routes, predict the total time based on your driving behavior, tells you what to expect next etc. Just knowing the path to your destination is not sufficient. You need to know many other factors during the whole ride to reach destination on time and without any hurdles.
For a good integrated BI system, a good Data warehouse architecture needs to be in place. Data warehouse architecture is “an integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting”. Below are the pictorial representations of different “flavors” of DW architectures.
Methodologies used by different architecture:
Kimball’s DW Architecture – Is based on ‘Bottom-UP’ methodology.
Inmon’s DW Architecture – Is based on ‘Top-Down’ methodology.
Dan Lindstedt’s Data Vault DW Architecture – Is based on ‘HYBRID DESIGN’
The first two design methods have some limitations for Data Warehouse layer such as inflexibility and unresponsiveness to the changing departmental needs during the implementation phase, insufficient auditability of data back to its source system, inability to integrate unstructured data, inability to rapidly respond to changes (organizational changes, new ERP implementations) or difficult to load type 2 dimensions in real time. This is where DATA VAULT came in to rescue. Data Vault follows a ‘HYBRID DESIGN’ methodology which follows ‘TOP-DOWN ARCHITECTURE WITH A BOTTOM-UP DESIGN’.
The model is a mix of normalized modeling components with type 2 dimensional properties. In this model, the DW serves as a backend system that houses historical data which is integrated by the business keys. All data ‘good, bad, incomplete’ gets loaded into the data vault and all the cleansing and application of business rules takes place downstream i.e., out of DW. This means that Data Vault model is geared to be strictly a data warehouse layer, not as a data delivery layer which still requires physical or Virtual star schemas or cubes for Business Users or BI tools to access.
Bill Inmon in 2008 stated that the “Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework.”
In Part 2 and 3, I am going to explain different components of Data Vault and it’s power with the help of some examples. That will clearly explains why the Data Vault should be a preferred “flavor” for different businesses.
- Jyothi Kaparthi
Better BI on Bigger Data
September 1, 2010
“Can we do BI without a significant infrastructure effort?”
What’s the most common BI tool? Microsoft EXCEL! If you are considering a DW initiative, you are likely encountering resistance from your hard-core, Excel based, analysts. Taking away Excel from them means that they will have less flexibility and spend more time waiting on changes. I’m guessing that they may not be on board with your vision.
1010Data is a zero-footprint, browser based data warehousing product. It has an Excel like feel, but it can handle billions of rows of data (not thousands like Excel). It’s fast, flexible and requires no infrastructure support from IT (if you are in IT, my guess is that you may have already stopped reading this article). The system is not an OLAP system, it requires no design, and not even indexing strategies – but it’s so fast!!
This is a new paradigm shift: from ETL to ELTAR – extract, load and transform as required. Hmmm….cleansing and summarization at query run time; views are used for data governance.
Don’t think it’s true? Dollar General is using the platform and was up and running in 5 weeks to 115 users with a 100% ROI.
What could you do for your customers if you could have their DW up in just a few weeks instead of a few months??
- Jodie
Making Information Available
August 9, 2010
I’m not sure if you’ve noticed, but I’ve not been blogging with the same gusto as of late. Ah the life of a Consultant.
I have been working with a local financial institution creating financial models this summer. (It leaves me with little time for blogging.) I did happen to stop by our 7755 Montgomery Road office today and checked my mailbox. In it was this month’s Information Management mag. I was immediately drawn to this month’s Snapshot: Making Information Available. Here’s some stats for you to consider:
61% of respondents are less than satisfied with their current process of creating information applications and are only lukewarm about their current information application technology. Here are their complaints:
- It takes too long to assemble and deploy applications.
- It is too difficult to assemble and view information into a simple view.
- There are not enough capabilities to integreate and normalize information from disparate applications.
WOW! I ask all of you fellow BI folks out there…what are you doing to solve this problem??? Why is it with all of the tools available today, our users are finding it too difficult to use them!! What are WE doing wrong?
As I mentioned, I am working with a customer on Financial Models this summer. I am fortunate to work with some SUPER SMART people in this group. They have come up with the most ingenious ways of getting their data out of old clunky systems. They can create some of the most INSANE Excel formulas to manipulate data! Their Excel sheets are visually appealing and get data to their management in a timely manner. I’ve had some spreadsheets that have taken me days to figure out the Excel formulas (and I’m a guru!). They are awaiting IT to “build them a DW” to make their lives easier. Here’s to hoping that it can deliver on their expectations! Here’s what I would do to ensure that it does:
1. Use an iterative methodology to build the DW. Recreate existing Excel reports from the DW as you go.
2. Implement a user-friendly reporting tool that allows them to create their own reporting. Give ‘em lots of drag and drop functionality and make sure it can Export to Excel.
3. Create a request process that allows the DW to change with the Business. Creating a process that queues up the work for months and months does not help the business user to create the financial package that’s needed at the end of the month.
4. Keep the model flexible. Doing this will ensure that you can always add a new organziation, hierarchy or measurement.
5. Build cubes! These users are smart cookies and they aren’t afraid of a Pivot Table. Give them the flexibility and performance of a cube and let them start to uncover their data.
Hmmm…what’s missing from my list? What would you add?
Happy building!
- Jodie
Using OLAP to Improve Organizational Effectiveness – Part 3
March 21, 2010
This is the third and final post in my series on using OLAP tools to improve the effectiveness of organizations. In Part 1 I discussed some background concepts and terminology. In Part 2, I talked about some specific examples of how OLAP can have an impact in this area. In this post, I’ll talk about a specific application: utilizing OLAP software to provide improved performance feedback to employees.
OLAP and Performance Feedback
Improvements to organizational effectiveness can also be realized by utilizing OLAP tools to provide performance feedback to individual employees. Improved performance feedback will help employees achieve group and individual performance objectives. Increased attainment of these individual and group performance objectives will, with proper alignment of these objectives and organizational objectives, improve organizational effectiveness.
There are several advantages to providing performance feedback with an OLAP tool. If the situation is right, feedback can be provided:
- At an individual level
- On a larger sample of employee activity
- Quickly
- In a meaningful manner.
Common Problems with Performance Feedback
Organizations often make attempts to improve the provision of feedback to employees. Newsletters with departmental performance numbers, posters in gathering places displaying performance charts, and managerial reports with quantitative measures of performance are all attempts to improve the distribution of feedback to employees throughout the organization. One problem with such efforts is that they are usually not provided at an individual level. Feedback on departmental, team, or group performance is certainly helpful but depending on the size of the group, its effect will be limited. Individual performance feedback has its own problem in that it is often time prohibitive to provide extensive individual performance feedback. The result is often weekly or monthly group performance feedback with individual feedback coming only during annual or quarterly reviews.
Individual performance reviews often suffer from another problem: small sample sizes for review. If an insurance company is reviewing the performance of claims adjusters using manually prepared data, it may be impossible to review more than a small sample of the adjuster’s work over what is typically a long review period. Small samples may, of course, result in a flawed appraisal of an employee’s overall performance.
The elapsed time between events reviewed and performance appraisals is also a problem with traditional feedback provision. Consider the timing of typical reviews: an employee makes a mistake in handling a situation in January, the incident turns up in a sample taken in May, and a review is finally conducted in June. If a review had been conducted immediately following the incident, the chance of the employee repeating the mistake will obviously be lower.
Traditional feedback provision often suffers from poor presentation of the message. An interview conducted by a busy manager attempting to perform a number of appraisals in addition to other work may not be optimally effective.
Performance Feedback Improvements with OLAP
Utilizing an OLAP tool may remedy some of the traditional problems with employee feedback. Imagine again the situation of an insurance company reviewing the performance of claims adjusters. As a solution to the problems listed above, an OLAP cube could be developed and made available to adjusters on a daily basis. Adjusters could be presented with individual performance feedback delivered via the web. They could see at a glance how their activity the previous day compared to group averages and organizational objectives. Exceptions could be noted immediately by the individual employee, rather than organizational objectives. Exceptions could be noted immediately by the individual employee, rather than a manager, and quickly corrected. Feedback could be provided on all activity from the previous day or week rather than on a small, dated sample. Finally, feedback could be presented in easy to understand charts which, in addition, roll-up to display departmental and organizational performance as well.
Improved performance feedback gives employees the ability to monitor their own performance and to take corrective action quickly. By improving the ability of individual employees to meet their performance objectives, the ability of the organization to meet its objectives and fulfill its mission is improved as well.
Conclusion
OLAP technology can improve organizational effectiveness by:
- Improving management’s knowledge of progress on objectives
- Improving employee coordination on efforts to achieve these objectives
- Communicating the link between employee effort and performance
- Communicating the link between employee performance and reward
- Improving employee performance feedback.
Although OLAP tools can provide assistance in these areas, their impact is obviously limited by factors specific to each organization. An OLAP tool cannot compensate for poor development of objectives, poor performance reward systems, or any of the other organizational factors discussed. Utilizing an OLAP tool as I’ve described in this series with no attention given to the underlying systems it is trying to address will, at best, have no effect.
In an organization that has clearly defined its objectives and has implemented well-designed reward systems, utilizing an OLAP tool as we’ve discussed can offer a tremendous payoff. The ability to provide employees with improved performance feedback and to demonstrate the link between individual performance and organizational performance is extremely valuable. By helping an organization align individual goals with corporate goals, an OLAP tool can help an organization become more effective.
Good enough?
March 16, 2010
When is good enough, well, good enough? I suppose that depends, one old argument says that close only works in horseshoes and hand grenades. Can it work with decision making? How about decision support systems? Is good enough the manually created spreadsheets that over 90% of organizations use for decision support? I would argue that while it’s not good enough, most business decision makers work that way.
To get at the data that most executives feel they need to make accurate decisions, many turn to the manual modification of existing reports, or the creation of their own “Pet” spreadsheet they use almost daily, or certainly many times a week.
In an update to a report cited last spring on this site, a September, 2009 Dartmouth University study suggests that the error rates in formulas on spreadsheets in their study were only .087% of all formulas they audited. HOWEVER, these were in cases where the formula produced the WRONG RESULT, and actually resulted in 87% OF THE SPREADSHEETS REVIEWED having errors in which the spreadsheet then produced the wrong result.
How good is good enough? What if you could reproduce the “Pet” spreadsheet in a true Business Intelligence solution which would ensure that the data and results in the sheet were as solid as the data in your transactional systems in the first place? How much does the wrong data or the wrong decision cost you, or your company? I would argue that “good enough” might just be good enough, if you could ensure that the data was accurate, and mitigated the possibility of error, while increasing the timeliness of the information to the decision maker. We have deployed such systems in a couple weeks’ time leveraging tools like SharePoint, Excel, and other software products that our customers already owned, and quickly delivered a system to our customer where we dramatically increased the accuracy of their information. These solutions form the basis of our iterative approach to Business Intelligence.
Using OLAP to Improve Organizational Effectiveness – Part 2
February 28, 2010
This is the second in my series of 3 posts on using OLAP tools to improve the effectiveness of organizations. In Part 1 I discussed some background concepts and terminology. In this part, we’ll talk about some specific examples of how OLAP can have an impact in this area.
OLAP’s Impact on Organizational Effectiveness
How can an OLAP tool help improve an organization’s performance as measured against its objectives? Answering this question requires a greater understanding of how strategies and tactics are implemented within organizations. I’ll use a model of organizational effectiveness developed by Michael Beer to illustrate the implementation of strategies and tactics.
The picture below shows a simplified version of a model of organizational effectiveness developed by Michael Beer (Note on Organizational Effectiveness, 10). Business goals and strategy influence and are influenced by top management. Management determines and implements the proper organizational design to achieve the organization’s goals. The design of the organization, in turn, influences human resources attributes of the organization. Finally, these HR attributes directly impact organizational effectiveness.

This simplified version of Michael Beer’s model is presented again below. Added to the model though, is the position of an OLAP tool in improving organizational effectiveness. OLAP technology exerts its influence on organizational effectiveness in three sections of the model:
- Management
- The Measurement and Reward Systems aspects of Organizational Design
- The Coordination aspects of Human Resources.

While the impact of OLAP technology in each of the areas above is slightly different, each is related and shares a common trait: improvement in communication. Utilizing OLAP tools to improve communication requires a broad audience for their utilization. OLAP tools are traditionally utilized by analysts and managers. In this model, front-line employees become critical users of the tool as well. The wide-scale availability of web-based OLAP tools makes such organization-wide implementations cost-effective.
- The expectation that effort will lead to performance
- The expectation that performance will lead to reward (Vecchio, 185).
- Total technical support calls
- Total calls requiring a call-back
- Total number of complaints
- Number of minutes to resolve a call
- Customer survey ratings of support representative performance.
- Their level of individual performance
- Their performance compared to targets and to organization averages.
New Partner: TARGIT!
February 22, 2010
Have you heard of TARGIT? TARGIT is a suite of BI Tools geared toward getting you to BI “in the fewest clicks”. LUCRUM has always been a big believer in doing BI..Faster! This suite of tools is a great tool in our toolbox. We encourage you to learn more: http://www.targit.com/Products/TARGIT_Suite.aspx
Using OLAP to Improve Organizational Effectiveness – Part 1
February 21, 2010
OLAP tools have been widely available for years and are in use in a large number of organizations. They are typically deployed as speedy, easy-to-navigate reporting tools. With a little creativity though, this class of software can also be utilized in a very different manner.
As organizations struggle to communicate their objectives to employees and to align the activities of those employees with the objectives of the organization, they can get help from these same OLAP products. OLAP software can help by providing the capability to:
- Improve management’s knowledge of progress on objectives
- Improve employee coordination on efforts to achieve objectives
- Communicate the link between employee effort and performance
- Communicate the link between employee performance and reward
- Improve employee performance feedback.
In this series of three posts, I’ll talk about the role OLAP tools can play in each of the areas above. But first, I’m going to start out with an introduction to the concept of Organizational Effectiveness. This introduction will give us a structure to frame the rest of the discussion.
I am not going to spend any time defining OLAP. If you’re interested, check here and here for some background and definitions.
Organizational Effectiveness Defined
Effectiveness is defined as simply having the intended outcome. In an organizational context, the intended outcome is the goal of the organization which is usually expressed in a mission statement. The Hierarchical Definition of Strategy provides a framework for defining and explaining these concepts and I am going to use it extensively in these posts.
Hierarchical Definition of Strategy
Explaining organizational effectiveness requires a discussion of business strategy and the Hierarchical Definition of Strategy provides a simple framework for this discussion. The Hierarchical Definition of Strategy is built on the concepts of Mission, Objectives, Strategies, and Tactics (Barney, 10). I’ve drawn a simple figure below to help explain this model:
An organization develops its objectives based on its mission while strategies and tactics provide specific details regarding the attainment of these objectives. In the Hierarchical model, the effectiveness of the organization can be determined by simply comparing actual performance to objectives. Michael Beer summarizes organizational effectiveness in this manner:
“An effective organization is one capable of implementing its strategy … A strategy is implemented effectively when people and groups in the organization work in a motivated, skilled, and coordinated manner on the appropriate tasks.” (Note on Organizational Effectiveness, 10)
In other words, the effectiveness of the organization is determined by its ability to achieve its objectives.
Hierarchical Definition of Strategy – Example
An example will help to clarify these concepts and make them a little more concrete. Dell Inc.’s Mission Statement is:
The high level nature of the statement, though necessary, makes it difficult for individual employees to apply it to their daily efforts. At the next level of the strategy hierarchy, Dell management has likely developed Objectives that will lead to the achievement of this mission. For instance, we can imagine that Dell has defined an objective to “Provide customer support with a customer approval rating of over 90%.” This supports their mission of “…delivering the best customer experience…” and provides employees with a tangible performance target.
The final two levels of the hierarchy are related to execution. Strategy is a means to accomplish an individual objective. Continuing with our imaginary Dell example, the strategy developed might be “Deliver the fastest, most accurate technical support in the industry.” This supports their objective in the sense that a firm delivering the fastest and most accurate technical support would very likely receive high approval ratings from customers. Tactics are execution oriented and exist at the lowest level of detail. In the Dell example, a tactic may be a requirement that all customer support personnel complete a certain set of technical and communication skill classes.
In the example developed above, Dell’s organizational effectiveness can be determined by comparing actual appraisals of their support services with their objective of a 90% approval rating.
Next Post…
Now that we’ve laid out some concepts and terms, we can move on to the heart of the discussion. In Part 2, I’ll dive into the details and talk about how utilization of an OLAP tool can help an organization become more effective.
The Value of Slowing Down: Go Slow to Go Fast!!
February 10, 2010
I once read about a Chinese mathematician who calculated complex scientific formulas by hand using a slide rule. He lamented the rising cadre of scientists who punched formulas into calculators and computers. Although they worked more quickly, the new generation of scientists often lost sight of the concepts behind the calculations. Without this fundamental understanding, the younger scientists often failed to grasp the significance of what they were doing or apply concepts in new ways to make new discoveries or effective designs.
This story parallels an area in Information Technology called “Business Intelligence.” Business Intelligence is also known as “Data Warehousing” and “Executive Information Systems” with dash boards or digital cockpits. The IT organization provides a rich repository of data for the business knowledge workers. Providing data has become so important; in addition, the tools leveraged have become more and more rich in functionality. And yet, the number of business users truly leveraging this kind of technology-oriented business information environment lags the productivity that the organization could receive. Simple questions like: who are my best customers and why? What’s my best product and what is its margin contribution? Why is my market share in a particular geography increasing where in another market it’s declining? How can I get my business results information faster so I can be more informed on the ever-changing aspects of the market? A user says, I can make a lot of informed decisions….how can I make even more of them instead of hire more decision-makers? The business and market questions go on and on and on.
As IT professionals, we are used to being held accountable to deadlines with ever changing resources and requirements. In the world of Decision-Making, as data warehousing managers, we often are rushing to meet these same deadlines. Often the deadlines and deliverables overshadow the underlying purpose for building the data warehouse. The good thing about bad times is that they force us to slow down and painstakingly evaluate what we are doing. So, although there are dark clouds ahead, there is a silver lining in the reality of our environment in having to do “more” with “less” resources.
Here are 3 tips to consider making your Data Warehousing environment even more “ready” for business decision-makers.
- Meet with the Business Decision-Makers frequently. I am suggesting that a weekly meeting at a minimum would be beneficial in order to review their data, listen carefully to understand what data they are really using, and what data they may be leaving behind. Is the data they are leaving behind the result of not understanding how to use the data, is the data no longer relevant to their decisions, or perhaps the data is too summarized or too detailed?
- Document the business flow of the data graphically using business terms, not technology metadata definitions. Distribute the business document to all business and IT users so that everyone really knows how the data is being used in the context of business. Too often, we revert to memorizing the technical definitions and only use them. We lose the business context and as new people join the data analysis, the true business definitions are lost.
- Proactively have discussions sponsored by IT with the Business Users about the cleanliness of the data and how IT is transforming the data. Show them the techniques that you are using to cleanse the data and transform it so that there’s a common repository of data that they can use. The more the Business Users understand what you do in context of the IT problem, the more they will provide their insight into how the data is most meaningful to use.
Chinese “Business Intelligence” Proverb: If you plan for one year, plant rice. If you plan for 10 years, plant trees. If you plan for 100 years, educate mankind.







