Business Intelligence - unlock your hidden data 

   
Do You Need A Data Warehouse?
By Clive Margolis

Anyone involved in Business Intelligence will probably try to convince you that you need a data warehouse. Business Intelligence and data warehousing go hand in hand. In most cases any organisation serious about measuring its performance will have one. But this doesn’t answer the question why you might need one. So what are the benefits of building a data warehouse, and what are the drawbacks? Is there a viable alternative?


The Problem
It would be great if decision-makers could simply go to their computer systems and pull out the information they need. Unfortunately it usually doesn’t work like that. Here are just a few of the reasons why not:

- It can take too long to retrieve the data they need
- The data may need to be summarised for it to make much sense
- The data may not be organised right for useful analysis
- The information you need isn’t even in the computer systems!

All of these issues can be addressed by a well-designed data warehouse.


But what is a data warehouse?
A data warehouse is a database that is specifically designed to produce information for reporting and analysis. “Reporting and analysis” can be anything from traditional paper listings, to hi-tech dashboards, email alerts, ad-hoc reporting and drill-up, drill-down analytics. The data is often gathered from several sources – including non-databases like text files and spreadsheets – merged into one place, tidied up, and reorganised into an analysis-friendly structure.

What you find out
Once you start on building a data warehouse you may discover all sorts of things about your data. Here are some of them:

- data is often inaccurate
- data is often inconsistent (it is in two places but is not the same in both places)
- data is often invalid (for example a date field contains some strange characters)
- the same data can mean different things to different people in the organisation

Because of all this you have to go through processes such as “data mapping” and “data cleansing” to present a clear, consistent view of your data.

The data you might need for reporting and analysis can also be inaccessible – maybe it is in the wrong format or blocked by security. These issues must also be overcome before the data can be made available for reporting.

Other important things that might happen to the data to make it report-friendly include:

- summarising it
- performing calculations
- reorganising it

(see also my website www.acestar.co.uk/Data-Warehousing-Overview.htm for a short description of a data warehouse.)

Why bother?
In order for a data warehouse project to be a worthwhile venture, the benefits must outweigh the costs. Fortunately the benefits can be pretty impressive if not essential. A 2009 study by the Aberdeen Group* concludes that

“companies actively building and using dashboards are able to gain visibility into metrics that are driving their business, and as a result are achieving drastically (my italics) higher performance than their peers.”

As recent studies from Business Intelligence research companies such as The Aberdeen Group, Forrester and Gartner show, the competitive benefits of being able to analyse your company’s (and other) data are becoming more and more valuable and necessary. Data warehousing can be a vital component of this analysis.
Data warehouse design - dimensions and facts
Probably the most useful way to reorganise the data for reporting and analytics is to split it up into two types of tables known as dimensions and facts.

Dimensions: Dimensional information is typically descriptive. Descriptive information is typically alphanumeric, although not always . For example customer name and address are descriptive and would be stored in a Customer dimension table.

Another very common dimension contains a set of calendar dates and is known – not surprisingly – as a Date dimension. Though it may seem a no-brainer, a date dimension can be a very useful place for storing all sorts of date variants such as accounting week number and various date formats – so the same date might be stored as a long date (month is spelled out in full) and a short date (dd/mm/yy – or mm/dd/yy).

Facts: Facts are usually numeric, for example the total sales recorded against the customer on a particular date.

Dimensions and facts - putting it all together
Putting the above information together we are able to produce a report which contains

Customer
Date
Total sales

Even this is a good start, because we can now produce reports showing total sales, broken down and sorted by date and customer. Sales held at a daily level in the data warehouse can be summarised to show total sales by customer by month.

Adding more dimensions
As we add more dimensions to the data warehouse we can break down the report in ever more sophisticated ways. For example, we might have a Sales Team dimension which includes Country, Region, Sales Area and Sales Team. Now we can sort, group or summarise our report by any of the following:

Customer
Date
Total Sales
Country
Region
Sales Area
Sales Team


The data warehousing process
The diagram below shows a typical data warehouse build. Data is gathered from one or more sources then loaded into the data warehouse in a process known as ETL – extraction, transformation, load.

In the extraction phase data is gathered from various sources and usually placed in a ‘staging’ database.

In the transformation phase the data is verified, then manipulated in various ways to get it into the right shape for the data warehouse design.

In the load phase the transformed data is loaded into the data warehouse ready for use.

Notice that the process is repeated, usually on a daily basis to reflect changes and additions to the underlying data.

 


What are the alternatives?
You can of course get reports without having to build your own data warehouse. You could report the data direct from your computer systems, or make a copy of the data to use specifically for reporting. Here are some of the options:

1. Get the data direct from source. That usually means the computer system it was originally stored in, say a financial or customer relationship management (CRM) system.

Advantages:
- Cost savings as little or no development is required

Disadvantages:
- Might slow down other system users

- Nowhere to do calculations useful for reporting

- Not designed for dimensional reporting

- Performance may be compromised

- Data organisation may be haphazard and unplanned

- Vital data might not be available to the reports

Conclusion: This is generally not the best solution for BI reporting, although it can work in some situations, for example where there are few users of the systems (assuming access to the data is permitted), limited reporting requirements and only simple manipulation of the data needed for reporting.

2. Use a data repository. This basically means copying the data you are interested in away from your transactional systems so it can be used exclusively for reporting. Sometimes the copy process is built into applications you may already be using.

Advantages:
- Performance is improved by the ability to tune the repository database for performance (for example by adding indexes) and also not having to compete with users of the transaction system

- There is often a limited scope for enhancing the data repository by adding new fields and possibly calculations

Disadvantages:
- This is still a halfway house to having your own data warehouse. It tends to work best when all the data you need comes from a single system, for example a CRM system. As such you don't get the degree of reporting flexibility that having your own purpose-built data warehouse offers

- Little or no scope for multiple data sources

- Data is not structured for reporting

- Data cleansing often nonexistent or less thorough so results can be inconsistent or inaccurate

Conclusion: A data repository will address some of the issues of Business Intelligence reporting at a reduced cost. However, this type of system leaves little room for developments to meet future requirements.


3. Use on-demand Business Intelligence
Also known as software-as-a-service (Saas), a growing number of companies are offering data analysis online, on a subscription basis. This is something I talked about in Newsletter #1. It’s a fast-growing market and one that can only continue to grow for the foreseeable future. In brief:

Advantages:
- Low start-up costs

- Provides a test bed for new ideas

- Could be useful for small dynamic requirements

- Could be useful for ad-hoc reporting

Disadvantages:
- Newness

- Security concerns

- Loss of control

- Data used might be unreliable

- Ongoing costs may rise

Conclusion: While many organizations large and small are already using Saas as part of their overall solution, most are probably testing the waters with this delivery method. Smaller organizations – for whom data warehouse development would be too expensive – may see Saas as an alternative to a data warehouse solution, mainly to provide graphical analysis such as charts and dashboards.


What are the benefits of having a data warehouse?
Now down to the nitty-gritty, as they say here in England. The benefits of data warehouse are many, and include the following:

- Flexibility and extendibility. Most of the effort in producing a data warehouse is in the initial design and build. Once built, it is relatively easy to add dimensions and facts to the existing design. Because you own the design, you can make it into anything you want. The scope for ad-hoc on-demand reporting is limitless

- Speed of reporting. You don’t need to compete with other (external) users for processing power. You can speed up reporting if necessary because you control the environment

- Security. You have control over data security

- Control over your data. Reporting data is separate from transactional data, consistent and accurate. Data can be merged from various different sources, for example CRM systems, financial systems and even data held on spreadsheets

- Ability to create your own KPIs (key performance indicators). You can do calculations on your data 'offline' so the results are waiting for you when you need them

- May be the only alternative. Sometimes you cannot produce reports any other way. For example, maybe you need to combine data held in so many places that you just don't have the processing power to do it. You simply have to combine the data to begin with, usually in a data warehouse

Tried-and-tested solution. You are less likely to run into development difficulties when using proven data warehousing technologies

What are the disadvantages?
The disadvantages of data warehousing largely come down to cost. Not just the cost of building it but also of maintaining and administering it. All of the functions listed below not only cost money but also require highly-skilled people to put them into action:

- Requirements gathering and analysis

- Data warehouse design

- Building the data warehouse

- Testing the data warehouse

- Administering the data warehouse

To sum up
BI professionals nearly always recommend a data warehouse, and so would I wherever practical. It's not just because – let's face it – we like building them. It's also because since we work with them all the time, we know that many of the benefits of having a data warehouse are long-term. Once you've made the investment you'll be glad you did providing you make good use of it.

However, other alternatives exist, which can work well when employed in addition to or even instead of your own data warehouse. These alternatives are – without doubt – becoming more interesting!

*Executive Dashboards – The key to unlocking double-digit profit growth. Lock, M (2009)