Putting your baby to bed :-(

July 10, 2008

Well, not a real breathing, baby food eating, diaper filling baby anyway.  What I’m talking about it is retiring a system that you were also responsible for creating and implementing.

In the beginning, there was a problem.  The problem was the client’s inefficiency in closing the financials and reporting said financials out to the masses.  Enter LÛCRUM.  LÛCRUM has a long history of solving complex data problems.  LÛCRUM delivered an enterprise data warehouse that allowed integration of various financial systems into a single repository with a common vocabulary that facilitated timely financial reporting to the organization.

Fast forward 4+ years.  The data warehouse keeps on chugging.  It performs monthly financial closing processes admirably.  Data is still integrated from numerous sources.  People still depend on the “numbers” that come from their financial reporting.  Enter “change”.  The client has been acquired by another company.  The buying company also has a data warehouse and decided to integrate the two together. 

Before we can decommission the data warehouse, the financial processes need to be migrated.  The buying company handles the development aspects of the migration to the new architecture and BI toolset; however the existing support team is the SME with respect to the existing system.  Discussing, detailing, and discussing again the AS-IS requirements with the development staff explaining what has to be re-created, the certain business nuances to be aware of, performance and tuning considerations, and other “why’d you do it that way” discussions.

As with all systems reviewed in retrospect, you’re going to find some things that you’d might change if you could have a do-over, however for the most part, LÛCRUM’s EDW has stood the test of time and required very little enhancements/rework.  Here are some observations that I believe resulted in a stable, high performing and well accepted data warehouse:

1.       Implemented Ralph Kimball’s methodology.

2.       Shielded the data warehouse from source system changes based upon SLA’s between the two parties.  When the source systems underwent changes, they were responsible for maintaining a standard data submission format.

3.       Shielded the BI tools from the underlying database tables.  All interaction was through database views thus allowing the data warehouse to make modifications along the way that minimized downstream impacts.

4.       Utilized an enterprise job orchestration tool.  This allowed for interaction of various upstream and downstream systems in a packaged application capable of communicating with various system platforms.

5.       Due to large data volumes, special attention was made to insure high performing interaction with data.

6.       All “heavy” lifting of data was performed at ETL time resulting in simple SELECTs from the BI toolset.  This prevented the client from being tied to a BI toolset.

7.       An online meta data repository was utilized to allow users to understand what data was in the data warehouse and/or what a particular business term meant.

8.       Visual production support processes were implemented that permitted the support team to be proactive in working through issues rather than being reactive to problems.

9.       Extensive testing was performed during development and implementation.  Unit, String, Regression, Performance, and User acceptance.  Almost an obnoxious amount of testing, but well worth it in the end.

10.   LÛCRUM built it! (Come on, you saw that coming didn’t you?)

 

Next steps are the decommissioning project.  We are figuring out what data needs to be archived for potential future reference.  Where will this subset of legacy data reside?  How might it be accessed?  What dependencies exist with the server architecture?  What parties need to be involved (server operations, UNIX team, storage team, dbas, etc.)?  How quickly can it be done?

All good things must come to an end.  I am proud to say that I was a part of this initiative at the start as well as at the end.  Our solution solved a problem.  Now it’s someone/something else’s turn to reach up to the bar that has been set.

Time to start singing a lullaby…

Dave

Describe Your Day

June 12, 2008

When analyzing information, one tends to always limit one’s view to a certain time frame.  Whether by a specific date or over a span of days, data is often meaningless unless you can put a box around it.  Obviously other filters come into the mix (product, geography, corporation, etc.), but the common denominator is most often time.

Having been a part of a number of data warehouse projects, the “Date” dimension is a common debate session and is often re-invented from project to project.  Why?  When it comes to high pressure deadlines and deliverables, why can’t we have a cookie cutter Date dimension that can be tailored and tweaked as necessary to meet your project needs?  Well, we can, but first we must decide on what the cookie cutter attributes would be for the dimension.

Take a few moments and see how many different ways you can describe a day. Tick, Tick, Tick.  How many did you come up with….5, 10, 20?  I came up with 84 (including a few found courtesy of internet searches) and then stopped.  Some examples….actual  date (duh Dave!), Month name, Week starting date, Quarter number, Weekend indicator, Fiscal Year, Last day of Month indicator, Day number of the year, etc.  Obviously some of these attributes can be readily calculated using database SQL functions, however when dealing with large data volumes, reading directly from a table vs. computing on the fly is preferable from a performance standpoint.  You also don’t want to burden an end-user to have to understand how to strip out the day portion of a field formatted in MM/DD/YYYY.

The Date Dimension is one of the few dimensions in the data warehouse that you can pre-populate once and forget about it.  With 365 days per year you can populate your table with 50 years of data for the low low cost of 18,250 records (give or take a leap year or two)…peanuts in DW speak.  Now give your user 84 different ways to describe a day.  Pretty powerful analytics to browse through.  How many sales are closed on the last day of the month?  What is the average attendance of a baseball game on a weekend compared to a weekday or a holiday?  How many weeks has your system been operational since its inception?

One common debate topic on the Date dimension is whether or not one should use intelligent or non-intelligent surrogate keys.  Surrogate keys provide a means to relate a metric/fact (i.e. Sales revenue) to a dimension (i.e. Date, Product, Sales Rep) that describes that metric/fact.  Common practice is to use non-intelligent surrogate keys in order to not tie the data warehouse to business codes that could conceivably change over time.   An example of a non-intelligent key would simply be a numeric field that automatically increments each time a new dimension record is added to the table.  Meaningless to you and me, but insures that the connection between the metric/fact and attribute will never be broken.  One could argue that the Date dimension is not susceptible to changing over time.  For instance, if we took today’s date and converted it to an intelligent key of 20080612, this would never change over time.  I could then write queries directly against my business metrics and limit them based on my interpretation of a date without having to perform a database join to the Date dimension resulting in a faster query.  Something to consider though is a common practice of placing records within a dimension to associate metrics that have an “invalid” or perhaps “not applicable” dates associated with the transaction.  If my Date dimension has a intelligent numeric surrogate key, that I have to come up with some bogus key (i.e. perhaps 9999999) to hold an “invalid” value or maybe -1 to mean “not applicable”.  Now it becomes difficult to interpret these values on the fly.  With a non-intelligent key, you are able to make these “non-dates” easily identifiable by simply including another attribute such as Type of Date.

What about Time of Day?  Time is a bit different.  How many different ways can you describe 1:00?  AM/PM, Work shift (maybe).  Normally time attributes are linked within the context of a day.  For this reason you normally don’t see a separate Time dimension and in fact, it is becoming more acceptable to just include the date/time combination as another kind of metric on your transaction while still allowing for the additional 84 ways to describe the day.

So while we are often pressed to work faster and better everyday, it makes sense to take some time out to build your toolbox of re-usable components.  The Date dimension is a good place to start.  Create the dimension in a data modeling tool such as Erwin.  You can then create various data definition language (DDL) scripts to a variety of database platforms (Oracle, Microsoft, etc.).  Build a one-time CSV file to populate the table and you can even eliminate the need to ETL the data into the table by creating Insert commands with the scripts.  Might not be the fastest way to actually insert the data into the database, but with a one-time operation that will occur on your time, why bother with something more elaborate?

Anyone tell me if 5/20/2041 is a US holiday without checking?

Dave

Do you know what’s running?

May 29, 2008

Answer the following questions…How many automated processes are running to support your organization’s operational, tactical, and strategic systems?  When specifically do they run?  Where are they run?  What dependencies exist between them?  Not easy questions to answer given that most organizations will support multiple flavors of systems.  Microsoft, Oracle, Peoplesoft, Mainframe, UNIX, SAP, etc. all have some sort of job scheduler or better phrased workload automation capability.  The problem is most are good at only running processes specific to their own platform or software.

If you support or have supported a system or application (who hasn’t?), keeping track of what processes run throughout the day (or go bump in the night) can be a formidable challenge.  Enter an enterprise workload automation solution.   These solutions help bridge the gap between various systems and infrastructures.

Providing a centralized application that is solely responsible for running required processes (i.e. database backups, journal entry postings, data warehouse loading, system tune-up routines, application health monitoring, etc.), better equips a support team to monitor and respond to events and in some cases become proactive rather than reactive to issues.

Workload automation applications are built to execute workload jobs.  Jobs are basically anything that can be run from a command line prompt.  Dependencies between jobs can be set that can range from simple (Job B runs after Job A) to complex (Job C in Application X runs after Jobs D & E in Application Y and Job F in Application Z).  They often provide functionality that allows for time dependent execution as well.  This allows an application to only run a job during certain times of the day or alert support if a job is running longer than expected.  In the event that a server can only handle a certain amount of workload before become overly taxed, workload automation applications can be configured to only submit a certain number of simultaneous jobs at a given time.

It is not uncommon for organizations to have multiple “calendars” for their systems.  Fiscal days vs. calendar days in the Accounting department for instance.  Workload automation tools allow for custom calendars to be used by applications.  Need to run system maintenance routines on Christmas or 4th of July…setup a holiday calendar.  Need to close the books on the 9th working day of the month, setup a financial closing calendar.

The workload applications are centralized on a workload server.  This provides visibility into what applications affect what servers.  You can also see when they affect them.  If a system needs to be scheduled for service, the workload server can be a good repository to determine when the best “window of opportunity” for the service would be.  All output of job submission and execution is logged on the workload server.  Metadata such as start & stop times, completion states, script output, etc. is available for analysis.  This proves to be an excellent source for understanding system SLA adherence as well as application/job trending for potential future maintenance issues.

All systems run processes “behind the scenes”.  Just because these are invisible to the end-users doesn’t mean they are meant to be forgotten.  A workload automation solution can be a good “one stop shop” for the management of these processes.  Something to consider if you have to go to 5 different places to determine what ran when and where did it fail?

Dave

Oracle supports Microsoft

May 16, 2008

I can’t tell you how many times I’ve been in conversations around the topic of “Oracle vs. Microsoft”. I’ve heard both sides of the story ranging from “SQL Server for mission critical operations…are you crazy!” to “Oracle costs me my first born child…year after year!”. While these discussions are often entertaining, the line delineating the two database giants is blurring by each subsequent release.

In my years consulting for LÛCRUM, I have worked for numerous clients that have had installations of both Oracle and Microsoft running in their environments. With recent statistics estimating that Oracle controls >50% of the database market and Microsoft controlling >50% of the server operating system market, are you surprised? SQL Server only runs on Microsoft. Oracle offers more operating system versatility. While you’ll see UNIX and Linux installations, Oracle’s ability to run on Microsoft remains strong and they are improving their functionality with respect to Microsoft development. Where might an Oracle database deployed on a Microsoft server make most sense? In the small and mid-sized business market (SMB). In the SMB market, Oracle has competitively priced versions such as Oracle Database Standard Edition and Standard Edition One.

So what advantages does running Oracle on Microsoft have to offer? First, Oracle has tight integration with Active Directory and Windows Security Framework. Items such as single sign-on and security via database role and Active Directory group fall into this category. Next, Oracle offers 32-bit and 64-bit versions. In the 32-bit version, Oracle is able to utilize up to 3GB (out of a 4GB O.S. maximum) of system memory for database use. Finally, Oracle has also been working on enhancing its ability to integrate with the Windows development suite, specifically Visual Studio 2008. Oracle supports .NET in 3 ways. The Oracle Data Provider for .NET leverages ADO.NET API and allows .NET applications to access Oracle data. These APIs should be familiar to most Microsoft developers. In addition, through an add-in (free for that matter), developers can work with Oracle services via Visual Studio 2005 (and 2008 as previously mentioned). Through the development suite, developers have access to various wizards to perform various database tasks (i.e. DDL), a procedure editor (for PL/SQL procedures, packages, and functions), a Debugger for runtime error interaction, and integrated help for items such as Oracle error reference, SQL, and PL/SQL user manuals. Lastly, Oracle has integrated .NET extensions directly inside the database. This allows developers to created stored procedures and functions using C# or VB.NET within Visual Studio. This code can then be deployed to the database and referenced wherever a stored procedure or function is permitted.

Oracle has shown it is advantageous to offer solutions that fit neatly into an operating system that controls the majority of the server market, even if that vendor also happens to be a major competitor in the database market. Offer a product that is extensible and easy to use with development GUIs is sure to give you a seat at the table when it comes to choosing a solution for your organization. That is precisely why Oracle supports Microsoft (most of the time <grin>).

Dave

Get a handle on Unstructured Data

May 8, 2008

One of the big topics in data management these days is Unstructured Data.  What is it?  Word documents, spreadsheets, video, images, email, and instant messaging are a few examples.  How does one harness the wealth of information contained in these non-standardized formats, IF you are trying to capitalize on your existing data management infrastructure?  Microsoft has attempted to answer this question with its upcoming release of SQL Server 2008 (SS2008). 

Due out later this year, SS2008 provides built-in support for Unstructured Data through the FILESTREAM functionality.  FILESTREAM combines the power of a relational database platform with the storage flexibility of a NTFS file system.  This is accomplished by storing references within the database to binary large object data (BLOBs) residing on the file system.  In this fashion, SS2008 manages access and interaction with the information, but is not responsible for the direct storage of it.  Unstructured Data can be accessed through typical Transact-SQL statements or via Win32 API calls.  FILESTREAM is a good option to consider when objects being stored are larger than 1 MB in size and is limited only by the volume size of the underlying file system.  If objects are <1 MB on average, you’ll get better performance by using the Varbinary(max) data type directly within the database.

From a security standpoint, FILESTREAM fits neatly into the database.  If a user has permission to query a table and column containing FILESTREAM data, they are able to access the Unstructured Data.  This access however does not carry forward at the file system level.  Only the account running the SQL Server service account has access to the files at the file system level.    

Is this only way to deal with Unstructured Data?  Of course not, but it is an option.  There are some limitations when using FILESTREAM with other SS2008 functionality.  Special consideration needs to be addressed when utilizing Database Snapshots, Mirroring, Replication, Log Shipping, and Clustering.

Continue to browse through other blogs on www.thefuturevalueofbusiness.com to see conversations on SharePoint 2007 and its role in taming Unstructured Data.

Dave

Dashboards for Dentists

May 1, 2008

This week I had my bi-yearly checkup with my dentist. I have never really minded going to these appointments. For the most part, other than always being gently scolded for not flossing regularly (come on, it’s a pain to do!), my visits are routine and without surprises. Over the last 10 years or so, one item I have always puzzled about is the WALL of patient files that exist behind the receptionist’s desk. How many patients’ records exist in that wall? How many are active patients? How many versions of x-rays exist per folder? What happens if the office goes up in smoke or is the victim of water damage? Does the history of the patient disappear?

Well this visit was a bit different, let me explain. I sat down in the chair and my hygienist explained it was time for x-rays. Ok no problem…put on lead vest…open wide and bite down on film…absorb some radiation…print out x-ray…review x-ray on white screen. To my surprise, my assumed process stopped at “absorb some radiation”. Instead of printing out the x-ray to film, the results of my x-ray immediately displayed on a LCD monitor next to my chair within “Dave’s dashboard”! My dentist had recently installed a new system and was in the process of converting the WALL into the digital age.

So what did “Dave’s dashboard” include?

  • A repository of x-ray films. This allows the dentist to quickly move between versions of films and allows him to monitor changing patterns in the mouth (tooth gaps widening/shrinking, jaw alignment, etc.)
  • Historical view of past visits. Included were procedures performed, costs associated, insurance company billed, future scheduled visits, etc.
  • A graphical representation of all the teeth in MY mouth. It showed my incisors, my molars, etc. But the neat part was that if something was “special” about a tooth, that “special” was represented in the graphic. Fillings were shaded grey. Cosmetic work was green. Areas “being watched” were blue. I was told areas with potential enamel problems would be another color.
  • A bunch of other “tabs” of information that I wasn’t able to view (couldn’t ask, mouth was full)

How cool is that? All information about a patient was online and accessible in a concise easy to read format. A format that can be shared directly with the patient to help them understand why their tooth is hurting or what their teeth may look like “after” a cosmetic change.

I attempted to ask my dentist about the software to get a feel for what technologies were used to create this. Obviously he didn’t have a clue, but it was running in Windows Vista and appeared to be client/server in nature. Not only did this appear to make the dentist and hygienist’s job easier from a paperwork perspective, but now this vital information was easily maintained, backed up and tucked away in the event of an office disaster. The reliance on the WALL was subsiding. So in a world where businesses are consistently trying to improve themselves, exposing the same old information in new exciting ways may just help turn on a light bulb to a new way of thinking or acting/reacting…even a dentist.

If I ever need to change dentists, I now only have to request my records be forwarded electronically to my next dentist. I don’t plan on doing so anytime soon though. I very pleased with my current one and he’s getting hip with the times and using software/technologies that I have expressed interests in.

Now where is that floss…

Dave

How to build a better team

April 24, 2008

We’ve all been involved on a team at one point in time.  This team could have been family, sport, education, business, etc. focused.  Each team member brought unique skills that may or may not have been relevant to every task at hand.  Some members were more experienced, some less, but all had a common goal they were driving to.  Hopefully someone had the map and directions though.  A team is unable to perform optimally if they are going fast or working hard, but for the wrong reasons.  In addition, to help team members understand each other’s strengths and weaknesses, a certain amount of team bonding or better phrased team building is often warranted.

Team building comes in a variety of shapes and sizes.  It can be as simple as coordinating team meetings where each member has the turn to “take the wheel” and share his or her ideas and concerns on the current direction.  Perhaps building comes in the form of regular gatherings ranging from daily lunches to off-hour social events.  Teams really interested in strengthening their interaction may benefit from a formal team building event run by professionally trained organizations/individuals.  In any case, some form of team building should occur.

I recently attended a team building event conducted by an organization specializing in team development.  We had the opportunity to bring together two teams in order to help improve our daily interactions as well as quash any of the “us vs. them” mentality.  This event involved a day and a half of activities at a facility away from the distractions of the daily grind.  Placing these teams in an unfamiliar setting (with BlackBerries off!) allowed us to better concentrate at the task at hand and more importantly, each other.

The event started with your typical icebreaker to get people warmed up and engaged, followed by some relevant discussions pertaining to our business.  Because this was an overnight retreat, we were able to get to know each other on a more personal level and discuss a variety of topics while enjoying a campfire through the evening.  The following day consisted of continued business discussions broken up by various team challenges.  In my opinion, it is in these challenges where you get to see the real team building occur.  Challenges ranged from mental puzzles to physically demanding events.  Depending on the challenge, we either worked as a collective team or were paired into smaller groups.  It was interesting to see how these teams worked towards a solution to the problem or challenge at hand.  Did the solution require brute force strength, mental stimulation, prior experience, or delicate handling?  One wasn’t sure at the start, but when you get a team of individuals throwing ideas around (or in some cases not throwing ideas around), the team begins to understand how they work best together and where they have room for improvement….thus strengthening their team resolve.  In the end, one gets a great sense of accomplishment working with others to solve a common problem.

After the team building event concluded, did we have an answer for all of our current business challenges…no.  Did we have a better understanding of how each person approaches a given situation…you betcha.  We also learned that when you’re in a pinch, “The Hand of John” may just reach down and help you out!

Your team member,

Dave

Is that really a Source System?

April 17, 2008

Last week I needed to purchase a new memory card for my daughter’s digital camera.  Because I rarely go to a brick-n-mortar electronics store anymore, I headed to Amazon.com.  After a few minutes of browsing, I found exactly what I was looking for.  I added the item to my shopping cart and proceeded to checkout.  As I was entering my payment/shipping information, I realized that I wasn’t actually buying the memory card from Amazon, but rather from a reseller on the site.  No problem, I didn’t necessarily care who I got it from.   I just wanted a product that met my needs at a reasonable price.  So in this case Amazon provided the service and another vendor provided the goods.  Daughter happy = Daddy happy.

Take this scenario into our consulting world and in particular, data warehousing.  A data warehouse is a repository of data.  Data is collected from various source systems residing throughout a company’s infrastructure and integrated, consolidated, and aggregated in a meaningful manner for decision making.  The source systems provide the service, but is the data they are providing necessarily originating from them?  Maybe yes, maybe no.  Do I care?  Maybe yes, maybe no.

The purist will say that if data isn’t originated in a system, you should keep swimming upstream to the “ultimate source”.  In this fashion, you’ve reached the system of origin and life is good.  But what if this information isn’t meaningful until it is been run through the company’s legacy costing model written in a proprietary system and supported by Bob who runs “process X” twice/month and doesn’t know much more than that? 

Given today’s ever increasing pressures on delivering more value in a shorter timeframe, is it better to deliver the goods to fulfill the customer’s need or improve the service by which the goods are delivered?   It’s a balancing act of business value, effort, and time.  I’m all for improving processes.  The cleaner the process, the more maintainable it becomes.  However, my job is also to meet my customer’s expectations.  One of the biggest values in data warehousing is its ability to “Unhide Data”.  I’ve come across numerous projects that have spun their wheels in source system analysis.  Manual processes were perceived as bad and had to be improved.  The timeline for delivery didn’t change, as a result, later phases (design, development, and oh my gosh testing!) just shortened.  Was that good time spent?  In most cases, the final answers came back as “leave the source alone, it’s working”.  Hours/days were lost and now the team had to work harder and longer with greater stress to meet deadlines.  Because of this, I would rather deliver solutions in multiple phases.  The first phase delivers the quick value (no need to look at the man behind the curtain).  Subsequent phases can look into the feasibility of streamlining manual processes and/or swimming past the legacy costing model.  Choose your battles and move on.

So which is it, give me the service or give me the product?  The answer is both…just give me a solution that delivers it.

Dave

Certified or Certifiable?

April 10, 2008

Ok, so you’ve been around the block a few times and think you know your stuff when it comes to Microsoft technologies.  What better way to prove it than to obtain a formal certification in your field of expertise?  Well for one, where do you start?  In a sea of dizzying acronyms (MCSD, MCSA, MCITP, MCPD, MCTS, MCSE, MCDBA…), how is one to survive the turbulent waters and inherent confusion of this journey?  Better yet, how are your customers supposed to understand these 4 character “feathers in your cap” that everyone displays proudly on their resume and business cards?  Good questions…

A number of our own LÛCRUM certifiables have navigated these waters in the past and to date they are the proud recipients of 15 such certifications.

Thankfully Microsoft has taken a fresh approach at this process.  Certifications now fall under 1 of 3 categorizations.

Technology Series – This series exhibits core technology skill on a particular product/technology.  Basically nuts/bolts stuff.  Typically requires 1-3 exams.  Certifications in this series retire when Microsoft product support ceases, thus they have a limited lifetime.

Professional Series – A “step up” from the Technology Series, this series adds job roles in addition to technology competency.  It demonstrates your ability to deliver solutions within that role.  This series typically requires 1-3 exams with a Technology Series certification prerequisite.  It also requires periodic recertification.

Architect Series – The Mac Daddy of Microsoft certifications.  This series displays your business IT prowess in addition to in-depth technology acumen to deliver enterprise capable business solutions.   This involves a rigorous entry process and a formal oral review board (conducted by peers already possessing an Architect certification) at the conclusion of the certification process.  It requires periodic recertification as well.

Across all of these series, only 4 certifications remain from the previous multitude of acronym chaos.  They are Microsoft Certified Technology Specialist (MCTS), Microsoft Certified Professional Developer (MCPD), Microsoft Certified IT Professional (MCITP), and finally Microsoft Certified Architect (MCA).   Each of these certifications allows for specialization in a variety of areas, however, the certification remains consistent.   Once obtained, your branding for resumes and business cards simplifies to your certification plus specialty (i.e. MCITP – Server Administrator).

So how does one obtain these new certifications?  Depending on the series…study, study, study and perhaps practical work experience.  Microsoft has detailed all the required exams for each certification/specialty combination.  Of course a wealth of choices exists for getting you to the exam desk.  Books, webinars, study groups, classroom learning, user groups, and conferences are all viable vehicles.  Practical work experience just takes time and exposure.  As a result, certification takes time and patience and is not something done overnight.

If you’re interested in more information, check out Microsoft’s certification website… http://www.microsoft.com/learning/mcp/default.mspx

Happy learning…

Dave