How Necessary is a Data Warehouse?
November 28, 2011
The largest and most complex aspect of Business Intelligence (BI) is the data warehouse. In this context, the data warehouse is the repository of data generally fed from many sources to keep historical perspectives of an entity’s data. It is a behemoth that is generally expensive, slow to build, complicated in structure and difficult to maintain. How necessary is it? Does a company need the actual, physical data warehouse to have a successful and sustainable business intelligence (BI) program?
There are many design methodologies that take these issues into consideration. There are advantages and disadvantages to both traditional (and non-traditional) methodologies which I do not cover in this post. My goal is to bring up points of view of why and when a data warehouse may or may not be used. What I would like to cover is:
- The Corporate Information Factory (CIF), based on the Inmon approach
- The Kimball Style of data warehousing
- BI using no data warehouse at all
Corporate Information Factory
The Corporate Information Factory methodology, in a nutshell, says there is no way of getting around this inevitable fact of the need for a data warehouse. In order to have a successful and sustainable BI program, a data warehouse is needed. Not only is it needed, it needs to be completely designed, built and populated prior to any further analysis or BI work can be done. This is due to the nature of how business concepts are intertwined within each other necessitating the big picture view. This style also views the architecture process more from the IT/data perspective compared to the business need point of view.
Kimball Methodology
The Kimball methodology of data warehouse design is not as structured and regimented as the Corporate Information Factory. The Kimball data warehouse is the sum of its parts; meaning one area of the business could be designed, developed and deployed providing BI insight while other aspects of the business have not been discussed. This concept will speed the development of the data warehouse compared to the CIF, but the underlying data warehouse can become much more complex as more and more is added to it along with the possibility of rework. This style views the architecture process from the business needs point of view compared to the IT/data perspective.
No Data Warehouse?
What about not using a data warehouse? In the new age of Data as a Service (DaaS), Master Data Management along with Service Oriented Architecture (SOA), why re-store data from disparate systems? Why not store the metadata of where the data is found and attach the business logic to the SOA call? This can be a very powerful way to gain insight into data. The idea that the development of a data warehouse can be done without the data warehouse. There are already tools that will do this. One of them is Qlikview from Qliktech. The basic premise behind this tool is to allow the user to develop the Transform and Load aspects of ETL (Extract Transform and Load) in memory to delivery very quick analytics in a solid visual manner. This tool is not a methodology, but SOA could be used in a larger context with the same principles. This style views the architecture process as something the business could do, but IT does not have to do.
The idea that a data warehouse is necessary for a successful BI implementation is not necessarily true. A data warehouse is not necessary to have analytics or provide a picture of the data you have. I believe it is very questionable to say this process is sustainable to leverage every benefit for BI. The very important aspect of BI that cannot be overcome by SOA, or in-memory analytic tools like Qlikview, is the entire reason the data warehouse first came about.
The decision for building or not building a datawarehouse is all about the history of the data. Not the history that is required by law to be kept like financial data or what in many cases is considered ‘facts’ in the Kimball style. If this were the only history needed, a data warehouse would be less necessary. The type of history that is important is the history that cannot be reproduced within the source systems. This is the history of changes made that are not kept by the source system. In many cases a customer’s address may not be historically important in a transactional/source system so only the most current record is kept. If that history is not kept somewhere (like a data warehouse), analytics of historical purchases of products will not show a true picture of what actually happened. It will only show the picture of what is in the source system at the current point in time. This situation is the quinticential lynchpinn for why a data warehouse should be necessary. The ability to track and keep history that is not kept in the source system is something SOA, or in-memory BI is not capable of reproducing.
If the desired BI capability for the business is operational in nature, a data warehouse will not offer any significant benefit over SOA. This is a short sighted tactical means of looking at data and cannot provide strategic insight, but it certainly could be the best way to answer that need for data given the circumstances. This would not be the end-all-be-all for BI, but it certainly can provide means to start a program.
So does this completely answer the question “Is a data warehouse necessary for BI?” The data warehouse is necessary for a complete and sustainable BI program, but it does not have to be the start of the program. So… of course the answer to that is still…. “It depends…”
… Doug
The idea of an Information Democracy…
November 21, 2011
During the Business Intelligence Symposium presented by Lucrum in conjunction with the University of Cincinnati, College of Business, Filippo Passerini, Group President of Global Business Services and CIO of P&G, promoted the idea of an Information Democracy. He is not the first person to use this phrase, only the latest to try and specifically define what is meant by the term. The power of providing an Information Democracy to the data consumers enables similar freedoms to the citizens of the U.S. democracy.
LIFE: the growth of an organization using data driven decisions (a company that is not growing is dying)
LIBERTY: the ability to quickly make the appropriate decisions based on data (a company is less suppressed by competent data driven decision making)
THE PURSUIT OF HAPPINESS: the ability to improve profits (what company is not happier with more profit??)
Since there are many types of democracies, the term Information Democracy is not easily refined. Mr. Passerini discussed his idea of Information Democracy as providing the same information at the same time to all that should view that data. This Lateral information exchange has enabled P&G unprecedented access to data propelling their decision making to be quicker and based on current data.
The purpose of their Information Democracy is to provide not only one version of the truth, but the same version of the truth to everyone. This might sound like the same concept, but there is a subtle difference and it deals with the latency of the data and ability to massage results. It tries to eliminate the “My data shows…” statements made by many because the data is owned and seen by all people at the same time. There is no delay to anyone in receiving data, no standardized reports to be re-issued, no side data to be pulled into Excel to get a different look, just the data received in a dashboard/cockpit environment.
The delivery of the data in Mr. Passerini’s Information Democracy is prolific. The same pieces of information are delivered via mobile devices, traditional PCs or P&G’s Business Sphere environment (a conference room of walls with electronic displays filled with information). The same data provided at the same time to all parties involved using multiple delivery devices allows the entire P&G managerial structure to evaluate data wherever they may be. This pervasive data culture is another example of P&Gs increased ability to adapt their business more quickly in a team environment.
The Information Democracy has not come easily at P&G as they have had to overcome obstacles. It has taken a huge effort to change the culture to embrace data for data driven solutions. Security issues make the delivering of data to all the necessary people difficult. The technology to do this is available, but the governance was generally lacking. These issues must be addressed, as P&G has, prior to successfully implementing the idea of an Information Democracy.
Transparency of the data (showing the same data to all necessary parties), timeliness of the data (getting the data to all parties as early as possible), and transportation of the data (delivering the data in multiple formats for easy consumption) make the three branches of the Information Democracy much like the executive, legislative and judicial branches make up our democracy. With these branches and the appropriate data governing processes, there truly can be an Information Democracy allowing data “…of the people, by the people and for the people.”
…Doug
A Data Architect’s Initial View of Data Vault
November 18, 2011
As a data architect that has never heard of the data vault concept, I was skeptical of the value and validity of the data vault. The timing of the training was not the best considering it was held after a long day of work, but what made it interesting was the passion and expertise shown by Jon Shirey, a Principle Consultant with Lucrum. The training was an introduction of Dan Lindstedt’s concept of the data vault as a way of showing the validity to the method.
Having designed and implemented Kimball style data marts and data warehouses, I was very skeptical at first. The Kimball methodology has been proven over and over time and again. How could something, like the data vault, that has been around a decade receive such little push from the mainstream Business Intelligence groups like TDWI? How could this seemingly over simplified methodology really do what it claims?
As the presentation continued, Jon started asking if there have ever been issues with the Kimball style with changing the data model. Of course there has been, hardly ever does anyone ever get the first cut at a star right. There always seems to be things missed, either by looking in the data or by the subject matter experts forgetting to explain something. It is one of the largest hurdles to overcome in a star design, but everyone has to deal with that… right?
One huge advantage to the data vault is a way to easily get around this issue. If something is missed in the design, the data vault is adjusted; the star design is adjusted and reloaded. No data is ever lost in the staging areas due to the misunderstanding of requirements because the data vault methodology works around this. The history is kept in the vault so the ‘loading’ into the star could be re-done every time the requirements change without anything ever being lost.
Of course no data architect worth his salt would ever make a design that would lose data like that… Really? The staging areas have always been designed by the team correctly and every single star design has been flawless or the business had not changed how business is done? The crux of this comes down to not having to ‘cook’ business logic into how the star is loaded from the source. All the data is loaded into the data vault first and then business logic can be used to load a specific data mart/star schema. This idea in the methodology is a genius way to never having to say ‘sorry’.
When this benefit of the data vault was covered, my initial thought was, well this is why you have a staging area. All the data will be kept there so you can always come back to it later. The more I thought about that idea, the more I recalled how complex the staging area becomes in trying to do this. There is no way to model all the twists and turns needed to code this way as well as the space needed to keep a traditional ETL environment up and running. The more I thought about it, the more I began thinking a traditional staging area and its complexities are a huge headache! The simpler design using the data vault methodology as the persistent staging area offers huge benefits over the traditional Kimball style data warehouse staging area. This includes repeatable code use in building and populating the data vault as well as the ability to easily account and validate the data.
Validating the data in the star can be daunting due to the business rules involved. In order to validate the numbers found in the star, the logic used to build the star has to be applied to the source system in order to compare apples to apples. This is not necessary in the data vault. The simplicity of the data vault makes auditing and validation easier. Since the data gets entered into the system as it is in the source, auditing becomes easy.
As this topic was being covered, my Kimball instincts took over and said, “Well, we have to do this anyway and can use the staging area…” This can be true, but the simplicity of the data vault methodology makes this process so much easier and includes the ability to take commonly used data with different meanings in different departments (think customer to sales vs. marketing…) and easily link that data together. This is no small task in the Kimball data warehouse world.
After one session and after thinking it over, I have come to the conclusion that using the data vault as the persistent staging area for a Kimball style data warehouse or the non-user accessed data warehouse for the Inmon style is the best way to allow for quick design that truly can be iterative. The ease to get this design started and the flexibility to easily change what is presented to the user through data marts makes the data vault concept truly unique.
Turning an Idea into an Innovation
November 3, 2011
Turning an Idea into an Innovation
How many times have you told yourself . . . why didn’t I think of that? Or, how many people have gotten rich off an idea you already thought of 10 years ago?
We all are more creative than we think. The difference is in knowing how to turn a good idea into an innovation. What’s the difference? A good idea is just that – an idea. An innovation solves several problems at once and brings value to customers by helping then solve a problem they didn’t even know they had.
Innovations Aren’t Always Planned
When biologist Alexander Fleming came back from vacation and found the bacteria in one of his petri dishes had died, he didn’t view it as a failure, instead he recognized that something extraordinary had happened. From this ‘unplanned accident’, came the discovery of penicillin.
Fleming’s discovery illustrates what Yale psychologist Robert Sternberg calls ‘selective coding’. Selective Coding is the ability to distinguish important information from irrelevancies. The key is being able to detect the relevant ‘signal’ amid irrelevant ‘noise’ which is accomplished by determining critical information components that bring value to the situation. For example, is it necessary to be given instructions on how to build a watch, if all that’s needed is the time?
Possessing a unique eye to detect patterns among unlike combinations, and separating noise form news, has the capability of solving perplexing problems. This capability is what catapults an idea into an innovation.
Metaphors Make the Impossible, Possible
Turning an idea into an innovation is the ability to draw comparisons and analogies from juxtaposing elements or ideas that ordinarily don’t go together, and recognizing the hidden pattern of connections between them. It’s not enough to be able to pick out all the right or new pieces, but being ableto put them together in a new way is what is crucial.
The best tool to assist in combining unfamiliar concepts, is the metaphor . . . thinking in terms of something is ‘like’ something else. For example, who do burrs and socks have in common? Velcro. The concept of interlocking ‘hooks’ gave way to a new fastening system. A new fastener had not been invented since the mid 1800s.
Developing ideas form metaphors involves changing the way a question is worded, or brainstorming on ‘What if’ scenarios. What would happen if a coin punch and a wine press were combined? The Printing Press. What would happen if customers could order products whenever they wanted? Electronic Commerce.
Metaphors draw a mental picture. This picture is especially useful when communicating a concept that is ‘un-like’ any other product or service. Referencing something that is similar in concept, makes the unfamiliar, familiar.
Every Innovation Needs to Be Needed
Important to any innovation, is the timeliness of its purpose – a context in a relevant time. The purpose could be in response to competition, to demand or need, or in response to new regulation.
A national retail pharmacy chain responded to legislation, that requires Pharmacists, not the Technician, to dispense pharmaceuticals to customers. Without hiring additional pharmacists to support the additional tasks, the innovation came by redesigning the physical store layout and the way tasks were performed.
Once the idea of how to comply with legislation was developed, it was presented to the audience in a context that had meaning to them. Meaning for a customer may mean validating the idea against a list of criteria (relevant information) such as budget, time, and resources.
Validating the idea meant gathering information with regard to technical requirements, safety issues, production capabilities, etc. The idea at this stage of development, answered more questions than it generated. At this stage, the idea became an innovation.
In redesigning, the pharmacy layout and tasks between people, several problems were solved at once. Not only was the retail pharmacy chain able to comply with the new legislation, but as a result, an unplanned benefit occurred – prescription renewals escalated by over $200,000 per store.
The Impossible is Possible
Ken Olsen, president Digital Corporation, state in 1977 that “There is no reason for any individual to have a computer in their home.” We all learn from our mistakes and missed opportunity, however the value of innovation versus an idea, is bridging for the customer, the gap between ‘ I think it will work’ and ‘I know it will work’. It means recognizing patterns and separating necessary information from the irrelevant to create solutions that were never imaginable.
It is providing that level of detail and ingenuity that turns an ordinary idea into an extraordinary innovation.
Prepared for LUCRUM, Inc. by Susan Thomas, October 28, 2011
Employing Knowledge Management
November 3, 2011
Employing Knowledge Management to Reduce Information Overload
and Gain Competitive Advantage
In 1994, Timothy O’Brien produced a play called An Object Orientation, in which two characters search in vain for a piece of lost information, only to realize that what they were really seeking was meaning, not data.
It is obvious we live in an information age where overload and information anxiety have reached epidemic proportions. What can be done to sort through what we need, and when we need it, in order to produce meaningful results?
As Thomas Stewart states in his book, Intellectual Capital, “Intelligence becomes an asset when some useful order . . . when it is given coherent form . . . when it can be deployed to do something that could not be done if it remained scattered around . . . Intellectual capital is packaged useful knowledge.”
Classifying It
The primary purpose of intellectual capital is innovation. Human capital grows in two ways: when an organization uses the skills and experience of its people, and when more people contribute their knowledge effectively to the organization. Two principles of knowledge are classification and recognition.
Classifying and managing intellectual assets requires knowing what you are trying to accomplish. What would make a profound difference in the product or service delivery cycles? For example, the purpose of managing problems is to respond more quickly and attain a high level of customer service. The purpose of managing workload is to be able to predict requirements for staffing and adjust for peak loads. The purpose of managing of managing skills is to ensure that the right skills are in the right place at the right place and right time. The purpose of managing service levels is to monitor customer expectations.
Only when a problem has been identified then classified, can studies be conducted to isolate and resolve a particular incident. The next step is to understand the information.
Understanding It
To find order amid chaos of data overload, concentrate on providing information in context to provide meaning. It is information is context that reduces the amount of time necessary to interpret (or misinterpret) a message.
For example, it is difficult to visualize the size of an acre, if one has never seen an acre. To make the unknown familiar, provide context – an acre is about the size of two football fields (without the end zones). Announcing the pollen count is useless, unless the count is given within a range of numbers (e.g. upper and lower limits). Now I have meaningful information.
Determining what kind (classification) of information – quantitative or qualitative is important to understanding. Qualitative information is descriptive – it’s hot today. Quantitative, is digital – it is 94 degrees. Depending on your business, knowing the difference between qualitative and quantitative information and using them in the appropriate context, can at times, mean the difference between life and death.
A pilot needs to know the exact distance (quantitative) from another aircraft. It is of little value knowing the other plane is ‘very close’. In this scenario, relaying the correct classification of data (quantitative), allows the pilot to make an informed decision and correct the situation.
Information is power, but it can lead to problems. It encourages people to hoard information, manipulate it, and possibly use it as a weapon. However, an active knowledge management network tends to be self-correcting – especially if your task performance depends on it.
Managing It
Only mismanagement of customer information can explain why U.S. companies on average lose half of their customers in five years.
Knowledge sharing only works when you pay attention to organizational needs and processes of the user. Managing knowledge involves understanding the relationships between functions and people, not isolated tasks. It incorporates how well people can access, interpret and utilize the information. The goal is to allow everyone in the organization to act faster and make more informed business decisions.
A knowledge management network can be formal or informal. Informal, is manual, word-of-mouth. Formal, typically includes an electronic system relying on integrated software to make information flow. A managed knowledge network includes clearly defined business rules that standardize procedures and incorporate personal accountability.
But more importantly, a managed knowledge network lets people focus on business, not technology. The benefit to the organization is the ability to make informed business decisions quicker than the competition.
Summary
Knowledge is an advantage. It prepares your company to react to unplanned events. It allows one to identify trends in customer activity. With the speed with which organizations change, managing for competitive advantage means managing knowledge. If I understand information, I can use it. If I can’t understand it, it’s useless. Therefore, all the time spend classifying it, understanding it, and managing it is wasted.
The efficiency, the agility, with which a company can augment intellectual capital, is the true measure of its effectiveness in the Knowledge Management Age.
Prepared for LUCRUM, Inc. by Susan Thomas, October 28, 2011


