|
Do You Need
A Data Warehouse?
By Clive Margolis
Anyone involved in Business Intelligence
will probably try to convince you that you need a data
warehouse. Business Intelligence and data warehousing
go hand in hand. In most cases any organisation serious
about measuring its performance will have one. But this
doesn’t answer the question why you might need
one. So what are the benefits of building a data warehouse,
and what are the drawbacks? Is there a viable alternative?
The Problem
It would be great if decision-makers could simply go
to their computer systems and pull out the information
they need. Unfortunately it usually doesn’t work
like that. Here are just a few of the reasons why not:
- It can take too long to retrieve
the data they need
- The data may need to be summarised for it to make
much sense
- The data may not be organised right for useful analysis
- The information you need isn’t even in the
computer systems!
All of these issues can be addressed
by a well-designed data warehouse.
But what is a data warehouse?
A data warehouse is a database that is specifically
designed to produce information for reporting and analysis.
“Reporting and analysis” can be anything
from traditional paper listings, to hi-tech dashboards,
email alerts, ad-hoc reporting and drill-up, drill-down
analytics. The data is often gathered from several sources
– including non-databases like text files and
spreadsheets – merged into one place, tidied up,
and reorganised into an analysis-friendly structure.
What you find
out
Once you start on building a data warehouse you may
discover all sorts of things about your data. Here are
some of them:
- data is often inaccurate
- data is often inconsistent (it is in two places
but is not the same in both places)
- data is often invalid (for example a date field
contains some strange characters)
- the same data can mean different things to different
people in the organisation
Because of all this you have to go
through processes such as “data mapping”
and “data cleansing” to present a clear,
consistent view of your data.
The data you might need for reporting
and analysis can also be inaccessible – maybe
it is in the wrong format or blocked by security. These
issues must also be overcome before the data can be
made available for reporting.
Other important things that might happen
to the data to make it report-friendly include:
- summarising it
- performing calculations
- reorganising it
(see also my website www.acestar.co.uk/Data-Warehousing-Overview.htm
for a short description of a data warehouse.)
Why bother?
In order for a data warehouse project to be a worthwhile
venture, the benefits must outweigh the costs. Fortunately
the benefits can be pretty impressive if not essential.
A 2009 study by the Aberdeen Group* concludes that
“companies actively building
and using dashboards are able to gain visibility into
metrics that are driving their business, and as a result
are achieving drastically (my italics) higher
performance than their peers.”
As recent studies from Business Intelligence
research companies such as The Aberdeen Group, Forrester
and Gartner show, the competitive benefits of being
able to analyse your company’s (and other) data
are becoming more and more valuable and necessary. Data
warehousing can be a vital component of this analysis.
Data warehouse design - dimensions and facts
Probably the most useful way to reorganise the data
for reporting and analytics is to split it up into two
types of tables known as dimensions and facts.
Dimensions: Dimensional
information is typically descriptive. Descriptive information
is typically alphanumeric, although not always . For
example customer name and address are descriptive and
would be stored in a Customer dimension table.
Another very common dimension contains
a set of calendar dates and is known – not surprisingly
– as a Date dimension. Though it may seem a no-brainer,
a date dimension can be a very useful place for storing
all sorts of date variants such as accounting week number
and various date formats – so the same date might
be stored as a long date (month is spelled out in full)
and a short date (dd/mm/yy – or mm/dd/yy).
Facts: Facts are usually
numeric, for example the total sales recorded against
the customer on a particular date.
Dimensions and facts - putting
it all together
Putting the above information together we are able to
produce a report which contains
Customer
Date
Total sales
Even this is a good start, because
we can now produce reports showing total sales, broken
down and sorted by date and customer. Sales held at
a daily level in the data warehouse can be summarised
to show total sales by customer by month.
Adding more dimensions
As we add more dimensions to the data warehouse we can
break down the report in ever more sophisticated ways.
For example, we might have a Sales Team dimension which
includes Country, Region, Sales Area and Sales Team.
Now we can sort, group or summarise our report by any
of the following:
Customer
Date
Total Sales
Country
Region
Sales Area
Sales Team
The data warehousing process
The diagram below shows a typical data warehouse build.
Data is gathered from one or more sources then loaded
into the data warehouse in a process known as ETL –
extraction, transformation, load.
In the extraction phase
data is gathered from various sources and usually placed
in a ‘staging’ database.
In the transformation phase
the data is verified, then manipulated in various ways
to get it into the right shape for the data warehouse
design.
In the load phase
the transformed data is loaded into the data warehouse
ready for use.
Notice that the process is repeated,
usually on a daily basis to reflect changes and additions
to the underlying data.

What are the alternatives?
You can of course get reports without having to build
your own data warehouse. You could report the data direct
from your computer systems, or make a copy of the data
to use specifically for reporting. Here are some of
the options:
1. Get the data direct from
source. That usually means the computer system
it was originally stored in, say a financial or customer
relationship management (CRM) system.
Advantages:
- Cost savings as little or no development is required
Disadvantages:
- Might slow down other system users
- Nowhere to do calculations useful for reporting
- Not designed for dimensional reporting
- Performance may be compromised
- Data organisation may be haphazard and unplanned
- Vital data might not be available to the reports
Conclusion: This is generally not the
best solution for BI reporting, although it can work
in some situations, for example where there are few
users of the systems (assuming access to the data is
permitted), limited reporting requirements and only
simple manipulation of the data needed for reporting.
2. Use a data repository.
This basically means copying the data you are interested
in away from your transactional systems so it can be
used exclusively for reporting. Sometimes the copy process
is built into applications you may already be using.
Advantages:
- Performance is improved by the ability to tune the
repository database for performance (for example by
adding indexes) and also not having to compete with
users of the transaction system
- There is often a limited scope for
enhancing the data repository by adding new fields and
possibly calculations
Disadvantages:
- This is still a halfway house to having your own data
warehouse. It tends to work best when all the data you
need comes from a single system, for example a CRM system.
As such you don't get the degree of reporting flexibility
that having your own purpose-built data warehouse offers
- Little or no scope for multiple data
sources
- Data is not structured for reporting
- Data cleansing often nonexistent
or less thorough so results can be inconsistent or inaccurate
Conclusion: A data repository will
address some of the issues of Business Intelligence
reporting at a reduced cost. However, this type of system
leaves little room for developments to meet future requirements.
3. Use on-demand Business Intelligence
Also known as software-as-a-service (Saas), a growing
number of companies are offering data analysis online,
on a subscription basis. This is something I talked
about in Newsletter #1. It’s a fast-growing market
and one that can only continue to grow for the foreseeable
future. In brief:
Advantages:
- Low start-up costs
- Provides a test bed for new ideas
- Could be useful for small dynamic requirements
- Could be useful for ad-hoc reporting
Disadvantages:
- Newness
- Security concerns
- Loss of control
- Data used might be unreliable
- Ongoing costs may rise
Conclusion: While many organizations
large and small are already using Saas as part of their
overall solution, most are probably testing the waters
with this delivery method. Smaller organizations –
for whom data warehouse development would be too expensive
– may see Saas as an alternative to a data warehouse
solution, mainly to provide graphical analysis such
as charts and dashboards.
What are the benefits of having a data warehouse?
Now down to the nitty-gritty, as they say here in England.
The benefits of data warehouse are many, and include
the following:
- Flexibility and extendibility. Most
of the effort in producing a data warehouse is in the
initial design and build. Once built, it is relatively
easy to add dimensions and facts to the existing design.
Because you own the design, you can make it into anything
you want. The scope for ad-hoc on-demand reporting is
limitless
- Speed of reporting. You don’t need to compete
with other (external) users for processing power. You
can speed up reporting if necessary because you control
the environment
- Security. You have control over data
security
- Control over your data. Reporting
data is separate from transactional data, consistent
and accurate. Data can be merged from various different
sources, for example CRM systems, financial systems
and even data held on spreadsheets
- Ability to create your own KPIs (key
performance indicators). You can do calculations on
your data 'offline' so the results are waiting for you
when you need them
- May be the only alternative. Sometimes
you cannot produce reports any other way. For example,
maybe you need to combine data held in so many places
that you just don't have the processing power to do
it. You simply have to combine the data to begin with,
usually in a data warehouse
Tried-and-tested solution. You are
less likely to run into development difficulties when
using proven data warehousing technologies
What are the disadvantages?
The disadvantages of data warehousing largely come down
to cost. Not just the cost of building it but also of
maintaining and administering it. All of the functions
listed below not only cost money but also require highly-skilled
people to put them into action:
- Requirements gathering and analysis
- Data warehouse design
- Building the data warehouse
- Testing the data warehouse
- Administering the data warehouse
To sum up
BI professionals nearly always recommend a data warehouse,
and so would I wherever practical. It's not just because
– let's face it – we like building them.
It's also because since we work with them all the time,
we know that many of the benefits of having a data warehouse
are long-term. Once you've made the investment you'll
be glad you did providing you make good use of it.
However, other alternatives exist,
which can work well when employed in addition to or
even instead of your own data warehouse. These alternatives
are – without doubt – becoming more interesting!
*Executive Dashboards –
The key to unlocking double-digit profit growth.
Lock, M (2009)
|