We do BI…faster
December 4, 2009
When asked, I say that I’ve been working on BI projects both accidentally and on purpose since 1993. The first Business Intelligence books were published in 1991 (Inmon) and 1996 (Kimball). TDWI, The Data Warehouse Institute, wasn’t started until 1995. So really, the Business Intelligence / Data Warehousing practices didn’t really start to become players until the mid-90’s. I consider all work that I did prior to 1998 to be “accidental BI”. Looking back, if I had understood dimensional modeling in 1994 I would have actually finished my Activity Based Costing project…but I digress.
There is a “new player” in the field of DW and his name is Dan Lindstedt. Dan’s view of DW is called The Data Vault. Endorsed by Bill Inmon (which speaks volumes), his method is a hybrid approach to the ODS and is “a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.” Today I sat down with my good friend Jon Shirey. Jon and I first worked together in 1993. He is the true Computer Science scholar that takes a disciplined and well thought out approach to data warehousing. He is by no means a cowboy and is always the #1 guy that I go to when I run into a tough question that needs an answer. Jon turned me on to The Data Vault a few years ago when he first attended training.
I was truly impressed with the architecture of The Data Vault. To me, it seems to allow for Real Time Warehousing and improves the auditability of data. When coupled with tools like Balanced Insight Consensus® rapid BI really becomes a reality. My wheels are spinning…I need to keep processing…I’m so excited… stay tuned for updates….
How Many Versions of The Truth Does Your Company Have?
September 11, 2009
Years ago, while at one of my previous employers, I was part of a cross functional team of people from across the organization given the seemingly simple task of identifying our best customers. Seems like this should have been pretty easy right? Define what best means, run a report or two, discuss the findings and cross it off the “to-do” list. Oh how naive I was to believe it would be so smooth.
In reality it took months to determine that there were in fact numerous clusters of our best customers, yet no master list that everyone could agree upon. Marketing had its list, and so did sales, and accounting, and finance, and operations. Everyone was pulling different points of data from different systems, and thus getting radically different results. We had multiple versions of the truth. Sadly, we never did figure it out. A month into the project, the company was bought out and we were pretty much all downsized out within a couple months – this actually ended up being a great thing for me. Still, that experience has stuck with me, and taught me a valuable lesson – to be useful, data needs to present one version of the truth.
Had we been able to create this list quickly and accurately, we could have spent time trying to determine the best ways to serve our best customers. We could have examined ways to get new customers that were similar to our best customers, convert good customers into great ones, and improved the results of the organization. Instead we spent time and money simply trying to figure out whose list was the right list. What a waste!
As the “marketing guy” I am not the utmost authority at LUCRUM on business intelligence and data architecture. Still, having gone through that experience makes me genuinely appreciate the services we provide our customers. I have been in their shoes – seeking the truth, seemingly finding it, wanting to move forward, only to have everything come to a screeching halt because of inconsistent, incomplete information. It is a frustrating situation to be in, and I smile every time we solve such a problem for a Client.
If you are spending more time debating the truth as described by your data than you are determining what to actually do about it, you should talk to us. We can deliver one accurate, actionable, complete, and timely version of the truth for you – saving you time, money, and a whole lot of aggravation.
It’s 11 O’Clock. Do You Know Where Your Databases Are?
November 11, 2008
Where are your databases? (And what is your strategy for knowing…)
Saying and doing are two different things. This is true and it’s also a good delineation along the lines of databases. How’s that you say? Well, for one thing, I have never worked anywhere recently (last dozen years or so) that had one database technology in play. Different products have different value propositions, especially over time – as new features and capabilities are rolled into existing product lines and even new database technologies are introduced. Recently I had an engagement where they used IBM’s DB2, Oracle 10g, Sql Server, Sybase, My Sql, Informix and even Red Brick (never heard of that one before). Sounds like it’s out of control, doesn’t it? Well, to tell you the truth, it wasn’t and here’s how I got there.
First of all, they had a strategy. This strategy was built on a common taxonomy of terms that was agreed upon by all parts of the organization. These terms were used to classify and underpin the database technologies. With these terms, there was a foundation and a single version of understanding in which to base conversations and negotiations on. Here is an example that might be helpful for you.
Proposed (or Candidate) – This category represents those technologies (databases) that someone or a team would like to be considered as part of the enterprise architecture. The process has not yet started for technologies with this classification, but they have been identified as having some initial momentum towards inclusion.
New (or Emerging) – This term does not reflect the current state of the technology, but rather the current state in the context of the organization. For example, the technologies with this classification have been approved and accepted as part of the enterprise’s strategy, but with a limited focus. These technologies indicate that there is a high degree of oversight in when, how and where they are used. It could well be that there is a need for DB2 (which is not a new technology by any stretch of the imagination), and in this case it could be that DB2 is ‘emerging’ in the context that the organization does not yet recognize this as a standard, but has agreed that it can be (should be) used within limited scope with oversight.
Standard (or Approved) – This term denotes that these technologies and products are used within the normal business practices. When these products are leveraged, there is not much discussion on scope and the oversight is usually limited to the technical framework in place versus the bigger discussions of should we and could we and how would it impact the rest of the organization and so on… We like to have standards so that we can focus our attention on solving business problems and not on political positioning and other side issues.
Contained (or Restricted)- This term refers to those technologies that are ‘restricted’. These are non standard and are usually relegated to existing applications. This is an important part of the strategy. While the scope is known, the oversight is focused on limiting the proliferation of these technologies for reasons of enterprise impact on the overarching strategy.
Declining (or Phasing Out)- This term refers to all those technologies that are being identified as ones we wish to sunset. Here this technology, while still in production, is not allowed for any new use and a plan is in place (that also includes a sunset date) for its removal from the organization.
Retired (or Removed) – This term refers to those technologies that are not allowed to be used at all under any circumstances within the organization. A formal write up of the justification and historic retirement plan must accompany the technology.
One important thing to note is that this taxonomy of classification is not linear. It could be that My Sql was identified as a candidate and after the proper feasibility discussions and general impact analysis, it was promoted to emerging. After some experience and discussions, it might be that My Sql becomes established within a niche use. When this is the case, My Sql would be promoted to contained; not that we want to phase out the product, rather, we want to liberate is growth but highly restrict it to a certain scope of use.
Start to understand your environment and where your technologies are located within the database strategy 101. Set goals to identify those that are too costly and create a strategy to sunset them. At the same time, look out on the horizon and see those new (or new to you) technologies that may play a part in your organization in the (near) future – start to embrace them and set processes up to manage them as they flow from term to term within the organization. Determine what your real standards are and socialize them. It could be that your headaches of the future can be averted if you properly set up a containment strategy. The benefits are there for you to harvest, but you have to do your part by leading the way.
Once we know what we have and classify those database technologies (or any technology or practice), we can then focus on the implementation of the strategy; Better, Faster and Cheaper. For without this mature taxonomy and disciplined approach, we will struggle to get a hold of our infrastructure and we need to be able to not only get a hold of our infrastructure, but to wrestle it to the ground an show it who is boss!
Enjoy the Journey!
~ Scott Felten
Oracle supports Microsoft
May 16, 2008
I can’t tell you how many times I’ve been in conversations around the topic of “Oracle vs. Microsoft”. I’ve heard both sides of the story ranging from “SQL Server for mission critical operations…are you crazy!” to “Oracle costs me my first born child…year after year!”. While these discussions are often entertaining, the line delineating the two database giants is blurring by each subsequent release.
In my years consulting for LÛCRUM, I have worked for numerous clients that have had installations of both Oracle and Microsoft running in their environments. With recent statistics estimating that Oracle controls >50% of the database market and Microsoft controlling >50% of the server operating system market, are you surprised? SQL Server only runs on Microsoft. Oracle offers more operating system versatility. While you’ll see UNIX and Linux installations, Oracle’s ability to run on Microsoft remains strong and they are improving their functionality with respect to Microsoft development. Where might an Oracle database deployed on a Microsoft server make most sense? In the small and mid-sized business market (SMB). In the SMB market, Oracle has competitively priced versions such as Oracle Database Standard Edition and Standard Edition One.
So what advantages does running Oracle on Microsoft have to offer? First, Oracle has tight integration with Active Directory and Windows Security Framework. Items such as single sign-on and security via database role and Active Directory group fall into this category. Next, Oracle offers 32-bit and 64-bit versions. In the 32-bit version, Oracle is able to utilize up to 3GB (out of a 4GB O.S. maximum) of system memory for database use. Finally, Oracle has also been working on enhancing its ability to integrate with the Windows development suite, specifically Visual Studio 2008. Oracle supports .NET in 3 ways. The Oracle Data Provider for .NET leverages ADO.NET API and allows .NET applications to access Oracle data. These APIs should be familiar to most Microsoft developers. In addition, through an add-in (free for that matter), developers can work with Oracle services via Visual Studio 2005 (and 2008 as previously mentioned). Through the development suite, developers have access to various wizards to perform various database tasks (i.e. DDL), a procedure editor (for PL/SQL procedures, packages, and functions), a Debugger for runtime error interaction, and integrated help for items such as Oracle error reference, SQL, and PL/SQL user manuals. Lastly, Oracle has integrated .NET extensions directly inside the database. This allows developers to created stored procedures and functions using C# or VB.NET within Visual Studio. This code can then be deployed to the database and referenced wherever a stored procedure or function is permitted.
Oracle has shown it is advantageous to offer solutions that fit neatly into an operating system that controls the majority of the server market, even if that vendor also happens to be a major competitor in the database market. Offer a product that is extensible and easy to use with development GUIs is sure to give you a seat at the table when it comes to choosing a solution for your organization. That is precisely why Oracle supports Microsoft (most of the time <grin>).
Dave
New Models in Warehousing and PaaS
February 29, 2008
Is a column-oriented database the optimal format for a warehouse? Database pioneer Michael Stonebraker thinks so. InformationWeek reports Stonebraker’s assessment that a column-oriented database improves warehouse performance 50x, and the larger the warehouse, the greater the gain. Why? Warehouses typically store transactional data. Where a row of data stores many pieces of one transaction, the typical row-based DBMS would retrieve all rows then aggregate the selected column, a column-based DBMS would not require the same overhead of row processing. Because column information is generally of similar format, columns could also gain compression and storage efficiencies. Interesting thoughts. How viable is a column-based platform? I’m not sure, but Vertica has secured $23.5MM in venture funding to find out.
At the same time Sybase has funded column-based research since the mid-1990’s. An also-ran in the database world, Sybase saw revenues up 70% last year “because the column approach yields better query performance,” says Sybase Engineering VP, Richard Pledereder.
Column-oriented DBMSs require rethinking the data and indexes because the transaction is not the central idea. Instead, the data architect must think in terms of collections of similar records, and subject based indexes rather than transactional element indexes.
One company, Sonian Networks, archives e-mail for other businesses housing data in Vertica’s data warehouse on Amazon.com’s Simple Storage Service. Which segues into the platform as a service model. Sonian expects its warehouse to grow from a few terabytes to a petabyte sometime in 2009. And to deliver, Sonian relies on infrastructure hosted elsewhere. Sonian develops the warehouse platform and releases to a hosted environment. Their clients never miss a beat and always have the most up-to-date platform. Which reminds me of the Salesforce.com platform-as-a-service model.
If you haven’t heard, Salesforce.com, Oracle, and Google have partnered to bring PaaS to an application near you. In this model you can develop business solutions on the Salesforce.com APEX platform that targets every end-user device without having to develop custom code for each device and, drumroll here, without having to manage the infrastructure behind the applications. Right now companies must manage the intricacies of their infrastructure along with devoted staff to ensure users can perform their business functions. With PaaS, your company can continue to narrow it’s strategic and tactical focus to the services and business solutions that matter, and can offload the infrastructure responsibilities to partners who effectively do this. Okay, we’ve heard that before. Yes, except that Salesforce.com signed up 100,000 customers soon after its announcement. With a goal of disrupting the Microsoft model of software delivery, we’ll see where this goes.
- Andy


