Describe Your Day
June 12, 2008
When analyzing information, one tends to always limit one’s view to a certain time frame. Whether by a specific date or over a span of days, data is often meaningless unless you can put a box around it. Obviously other filters come into the mix (product, geography, corporation, etc.), but the common denominator is most often time.
Having been a part of a number of data warehouse projects, the “Date” dimension is a common debate session and is often re-invented from project to project. Why? When it comes to high pressure deadlines and deliverables, why can’t we have a cookie cutter Date dimension that can be tailored and tweaked as necessary to meet your project needs? Well, we can, but first we must decide on what the cookie cutter attributes would be for the dimension.
Take a few moments and see how many different ways you can describe a day. Tick, Tick, Tick. How many did you come up with….5, 10, 20? I came up with 84 (including a few found courtesy of internet searches) and then stopped. Some examples….actual date (duh Dave!), Month name, Week starting date, Quarter number, Weekend indicator, Fiscal Year, Last day of Month indicator, Day number of the year, etc. Obviously some of these attributes can be readily calculated using database SQL functions, however when dealing with large data volumes, reading directly from a table vs. computing on the fly is preferable from a performance standpoint. You also don’t want to burden an end-user to have to understand how to strip out the day portion of a field formatted in MM/DD/YYYY.
The Date Dimension is one of the few dimensions in the data warehouse that you can pre-populate once and forget about it. With 365 days per year you can populate your table with 50 years of data for the low low cost of 18,250 records (give or take a leap year or two)…peanuts in DW speak. Now give your user 84 different ways to describe a day. Pretty powerful analytics to browse through. How many sales are closed on the last day of the month? What is the average attendance of a baseball game on a weekend compared to a weekday or a holiday? How many weeks has your system been operational since its inception?
One common debate topic on the Date dimension is whether or not one should use intelligent or non-intelligent surrogate keys. Surrogate keys provide a means to relate a metric/fact (i.e. Sales revenue) to a dimension (i.e. Date, Product, Sales Rep) that describes that metric/fact. Common practice is to use non-intelligent surrogate keys in order to not tie the data warehouse to business codes that could conceivably change over time. An example of a non-intelligent key would simply be a numeric field that automatically increments each time a new dimension record is added to the table. Meaningless to you and me, but insures that the connection between the metric/fact and attribute will never be broken. One could argue that the Date dimension is not susceptible to changing over time. For instance, if we took today’s date and converted it to an intelligent key of 20080612, this would never change over time. I could then write queries directly against my business metrics and limit them based on my interpretation of a date without having to perform a database join to the Date dimension resulting in a faster query. Something to consider though is a common practice of placing records within a dimension to associate metrics that have an “invalid” or perhaps “not applicable” dates associated with the transaction. If my Date dimension has a intelligent numeric surrogate key, that I have to come up with some bogus key (i.e. perhaps 9999999) to hold an “invalid” value or maybe -1 to mean “not applicable”. Now it becomes difficult to interpret these values on the fly. With a non-intelligent key, you are able to make these “non-dates” easily identifiable by simply including another attribute such as Type of Date.
What about Time of Day? Time is a bit different. How many different ways can you describe 1:00? AM/PM, Work shift (maybe). Normally time attributes are linked within the context of a day. For this reason you normally don’t see a separate Time dimension and in fact, it is becoming more acceptable to just include the date/time combination as another kind of metric on your transaction while still allowing for the additional 84 ways to describe the day.
So while we are often pressed to work faster and better everyday, it makes sense to take some time out to build your toolbox of re-usable components. The Date dimension is a good place to start. Create the dimension in a data modeling tool such as Erwin. You can then create various data definition language (DDL) scripts to a variety of database platforms (Oracle, Microsoft, etc.). Build a one-time CSV file to populate the table and you can even eliminate the need to ETL the data into the table by creating Insert commands with the scripts. Might not be the fastest way to actually insert the data into the database, but with a one-time operation that will occur on your time, why bother with something more elaborate?
Anyone tell me if 5/20/2041 is a US holiday without checking?
Dave
Comments
One Response to “Describe Your Day”
Got something to say?


















Dave - Interesting perspective. If only we could all use the Julian date format! Yes, the date, 5/20/2041, is in fact a holiday. It’s the day that the appreciation of soccer in the US will pass the level of appreciation that the Mongolian’s have for hand gliding!