A Beginner’s Guide To Data Warehousing

Almost every process and decision concerning data and information is encompassed by Big Data. From cloud computing, business intelligence and corporate intranets to enterprise data management, content data management, data mining and data warehousing, Big Data is behind the success of it all.

Let’s focus on data warehousing. Most of you have probably heard of data warehousing but maybe you’re unsure about what it really is, what it does and if your company or organisation really needs it. With so many tech terms and processes out there today, it can get really confusing to understand which is which. In this article we will explain what a data warehouse is and what it can do for your business. We will find out the advantages and disadvantages and provide you with the knowledge to decide if data warehousing is for you.

Image credit: http://www.visamtech.com/
Image credit: http://www.visamtech.com/

Why Use Data Warehousing?

Not many companies have their own data warehouse, however, storing information in a well-designed and well-implemented environment is crucial for good business decision making and reaching company goals.

The data warehouse concept is quite simple. Data is acquired on a periodic basis from source systems like ERP systems that contain essential company information. Then, the data from these systems is moved to a dedicated server that contains a data warehouse. When the data is moved it is also cleaned, formatted, validated, reorganised, summarised and then supplemented with data from other sources. A data warehouse will contain the main source of information required for analysis which can then be used by reporting tools in ad-hoc queries, dashboards and canned reports. Building data warehouses has become easier over the years because of the improvements in the tools and processes involved as well as a better understanding of data warehouse architecture. There are also many qualified consultants who can assist in building and implementation.

What Is Data Warehousing?

A data warehouse is a database designed to enable organisations and businesses to acquire intelligence. It exists to help users understand and enhance the performance of their business. Data warehouses are environments that extract, transform and transport particular data and assist in loading solutions, reporting, statistical analysis, client analysis tools, data mining processes and other data gathering processes. The result is insightful, useful and actionable information which can be delivered to business users.

Data warehousing can also be defined as a database that is designed to enable B.I. (business intelligence) activities. It is implemented to help users understand information and enhance an organisation’s overall performance. It is designed for analysis and querys rather than transaction processes. It also contains historical data acquired from transactional data and other sources. Data warehouses separate transaction processes from analysis workloads and it enables the business to consolidate data from several sources including internally developed systems, 3rd party data extractors, purchased applications and other reliable sources.

A Short History Of Data Warehousing

The practice of data warehousing kicked off in the late 1980s. The IBM Systems Journal published an article in 1988 titled “An Architecture For A Business Information System” which mentioned the term “data warehouse”. Although, the first use of the term was most likely from Bill Inmon, the “Father of data warehousing”, in the 1970s. He was the first to discuss the principles surrounding data warehousing and he worked extensively as a data professional and honed his expertise on everything related to relational data modelling.

Inmon’s work as a data warehousing engineer took off in the 1990s, when he formed his first company, Prism Solutions. One of the first Prism products was the Prism Warehouse Manager. It was one of the first industry tools designed for creating and managing data warehouses. In 1992, Inmon published Building Data Warehouses, which continues to be an important read for many data professionals as it tackles the theoretical background and real-world examples of data warehouse management and creation.

Bill Inmon. Image credit: http://damaottawa.wildapricot.org/
Bill Inmon. Image credit: http://damaottawa.wildapricot.org/

In the 1990s, he developed the concept of the “corporate information factory”. It is an enterprise-level view of an organisation’s data where data warehousing plays an essential part. His approach to data warehousing design is focused on a centralised depository modelled on the 3rd normal form. He felt that using strong relational modelling would lead to enterprise-wide consistency that facilitated easier creation of individual data marts to better serve the needs of different departments using the data.

Ralph Kimball on the other hand has a different approach on data warehousing. Inmon presented a robust theoretical background for the concepts surrounding data warehousing, however, Kimball’s The Data Warehouse Toolkit (1996) included a host of industry-honed, practical examples for OLAP-style modelling. Kimball’s career started in the 1970s, when he was a key designer of the Xerox Star Workstation which is the first computer to use a mouse and Windows operating system. During the 1980s, he gained wider exposure in the field of decision support systems when he became Vice President of Metaphor Computer Systems. When he founded Red Brick Systems in 1986, a full-fledged data warehousing application was one of their major products. The company was known for its relational model suitable for high speed data warehousing applications and solutions. He left red Brick Systems in 1992 and started his own consultancy firm called which at present is part of the Kimball Group. Kimball focused on the development of individual data marts at the departmental level that are integrated using the Information Bus architecture.

Ralph Kimball (L) @Kimball University. Image credit: http://blog.professorcoruja.com/
Ralph Kimball (L) @Kimball University. Image credit: http://blog.professorcoruja.com/

Data Warehouses and OLTP

Data warehouses differ form OLTP or online transaction processing systems. With a data warehouse, you separate the analysis workload with the transactional workload, meaning data warehouses are more of a read-oriented system. The system has a far higher amount of data reading as compared to writing and updating, which is why it will enable a better analytical performance and avoids impacting the transaction systems. Data warehouse systems can be enhanced and optimised to consolidate data from different sources in order to achieve key goals. This is of great value to many organisations because it can prevent corporate disputes and improves decision-making for those in upper management.

Traditionally, data warehouses would store months or years worth of data in support of historical analysis. The stored data is loaded through an ETL or extraction, transformation and loading points from multiple sources. The reliability and speed of ETL operations are the pillars of the data warehouse once implanted and running. Users of data warehouses perform data analysis processes which are normally time-related. Inventory analysis, time figures, profit by product and profit by customers are some of the best examples. In some cases, users may need highly aggregated data to complete processes. More sophisticated data warehousing systems can include data mining, trend analysis and use the existing data to forecast market trends and predict business opportunities.

Data Warehouse and Data Mart

A data mart is almost the same as a data warehouse, but limited in scope. A data mart may only serve a specific department or a specific line of business, but it can be built faster than a data warehouse because of its limited coverage. Although it is faster to build, it has issues with consistency and it takes time, patience and discipline to keep data and calculation definitions consistent across all data marts.

Data Warehousing and Business Intelligence

These two terms are a common source of confusion within and outside of the IT community. To make it simpler, data warehousing is the technology used to create a repository of data. Data warehousing is made of two primary elements: database and hardware. Data warehouses are multiple data tables and databases used to store information only limited by the storage capacity of the hardware. Hardware could include physical servers, hard drives and processors. These data warehouses are usually accessed through shared networks or an intranet. A data architect is generally responsible for setting up the database architecture, the structure and process management.

Business intelligence refers to the application and tools used in the interpretation of data and data analysis. Business intelligence can also be described as analytical software. Business intelligence And data warehousing can work together to provide and support a user dashboard that offers customised information to any user. Software solutions included in business intelligence includes the ability to manage large data sets, generate detailed reports and query creation.

The main distinction between these 2 solutions is their purpose. While data warehouses are designed for holding, storing, and supporting transactional data, business intelligence is the solution used to access and review data.

Data Warehousing and Data Mining

Data mining and data warehouses are business intelligence tools used to turn raw data into actionable information. Data mining is essentially a statistical analysis process. Data mining or data discovery is the process of analysing data from different data sources and converting it into useful and actionable information. This information is used in increasing revenue, cutting costs or both. The process also allows users analyse data from many different angles, categorise that data and then summarise the identified relationships. It can also be said that data mining is a process to find the correlations or patterns among different files in large relational databases. A data mining solution analyses patterns and relationships in stored transaction data based on open-ended user queries.

The Basics Of Data Warehousing

The basic idea of data warehousing is to create permanent storage for data required for analysis, reporting and other business intelligence functions. Although some experts say that it’s a waste to store data in multiple places, the many advantages outweigh the disadvantages. Data warehouses are placed on servers dedicated to the function of running  DBMS or database management systems like SQL server and SQL server integration services to pull data from different source systems and into the data warehouse. Here are just some of the reasons why you may need data warehousing:

  • Users need to integrate the different sources of data in real-time. It allows users to make better business decisions because they have access to more data. It will also save users time when retrieving data from multiple sources.
  • Users have large amounts of historical data which needs to be gathered in an accessible place with common formats, a common data model, common keys and common access methods.
  • Users are required to keep historical records, eepecially if the source transaction systems do not keep the records.
  • Users need to reconstruct the data and rename the fields and tables to make more sense to users.
  • Users are required to use master data management to consolidate many tables into one table.
  • Users are required to consolidate data for each department to produce results that are in line with other departments providing consistency to the system and to the other users.

Data Warehouse Architectures

There are three types of data warehouse architectures:

1. Data Warehouse Architecture: Basic

It’s a simple architecture for data warehouses where end users can directly access data derived from several source systems through the data warehouse.

Image credit: http://docs.oracle.com/
Image credit: http://docs.oracle.com/

2. Data Warehouse Architecture: Staging Area

This type of architecture simplifies the building summaries and general warehouse management.

Image credit: http://docs.oracle.com/
Image credit: http://docs.oracle.com/

3. Data Warehouse Architecture: Staging Area and Data Marts

When you want to customise your data warehouse design for different groups within your organisation, you can do it by adding data marts. Data marts are systems designed for specific lines of business.

Image credit: http://docs.oracle.com/
Image credit: http://docs.oracle.com/

Things To Know When Building Or Designing A Data Warehouse

Below are a few key points you should be aware of before creating or designing your data warehouse project:

1. Extraction, Transformation and Loading Requires More Time

Designers often visualise the end data in a top-down approach and then realise the complexity that is associated with data analytics. Most designers have the assumption that it will be easy to map the source data from the existing systems because users may already have a better understanding of the source systems. These assumptions will usually be void when coding the ETL layer to feed the data warehouse. Cleaning, mapping, and data preparation will be more time consuming as compared to the design of an analytics layer. Data warehouse designers and architects prefer to choose the case of data reporting and analytics over the background ETL because it can be presented to upper management if budgeting and costing is considered.

2. The Scope Of Data Warehouse Will Go Up During Development

Most users don’t have any idea of what they want until they see the data. Most often, users will usually want more out of their data when data visualisation, progress and development becomes possible. Most see this as unavoidable and the designers should allocate more time in accommodating ad-hoc requirements. Most of these requirements that were implicit in the beginning becomes explicit and are indispensable during the later phases of the project. Since it is unavoidable, you have to make sure to allocate enough time for your project.

3. You Need To Validate Data

Most source systems do not give accurate data because checks and validations are not made in the source system level. Each time a validation check is skipped, it will send unexpected data to the data warehouse level. Thus, before you process any data in the data warehouse, you need to do some validation checks on your end to make sure of the availability of the expected data.

4. Issues That Need To Be Discovered

Most designers often make the wrong assumption that the source system is error or fault free. This becomes apparent when you discover discrepancies and issues with the existing source systems. This is also the reason why many of them do not give the right time and resources in planning the project and dealing with those issues. Once these issues are discovered, it will take many man hours to get them fixed. In some cases, those issues are not even fixed at the source.

Image credit: www.maineq.com
Image credit: www.maineq.com

5. User Training Is Not Enough

Some say you cannot teach old dogs new tricks and this also applies to existing business solution users who have been using a system for a long time. If you’re trying to implement something new, expect that there will be resistance from many users. In order for you to make the transition as easy as possible, user training is made available for data warehouse users, but you might also notice that such training is not enough. Employees will often need to unlearn many things in order to learn to use the new system.

This is one challenge that can make or break a data warehouse project. No matter how much effort you put into the project, if the users are not using it it correctly it could become a total failure. You have to make sure that all efforts are made to connect the users to the new system and train them to use the new data warehouse.

6. Conflicting Business Rules Among Users

System users usually belong to different departments and are often not aware of what the other departments are doing. When you combine all data from different departments, you will most likely discover that the business rule suggested by one user is quite different from the business rule suggested by another user. There should be involvement and collaboration among users and departments to come to one conclusion for using the system.

7. Misjudgement On Volumetrics

Volumetric estimates that are not up to standard in the starting phase of the data warehouse project will eventually go haywire. This can occur if there is a slight change in the business standard metrics.

8. The Cost Of Data Warehousing

Maintaining data warehouses can cost your a business a lot during the development and deployment stages of the system. Software licensing, regular data purging, server maintenance and database maintenance all add up. You have to set the right expectations from the start of the project about the possible huge capital you need to get things rolling smoothly.

9. Refresh Time Is Top Priority

Data should be uploaded or updated in the data warehouse at least once every day. Loading the latest data will ensure that the reports are fresh and up to date. Although this is good practice, the time needed to load data will eventually become longer than what you initially expected or calculated. This will also increase because of your day-to-day operations.

Performance considerations are essential to make your data warehouse a success. You do all the necessary things to make your system perform as expected, or else everything will just be a waste of effort and money.

Advantages and Disadvantages Of Data Warehousing

As you know, a successful data warehouse is the backbone of a successful business intelligence program. With all the challenges mentioned above, you may think that having a solid data warehousing project is a waste of capital especially if you can’t see the advantages. For data warehousing, you will not know the advantages and disadvantages until the system is up and running.

Advantages include:

1. Provides Enhanced Business Intelligence

With data warehousing there will be no need to make business decisions based on limited information because you will have access to various data providers. Data warehouses and business intelligence can be directly applied to many business processes including inventory management, marketing segmentation, sales and managing finances.

 2. A Data Warehouse is a Time Saver

Businesses and enterprises will be able to easily access critical data from a number of sources through a central depository. Because of this, upper management can quickly make informed and accurate decisions on many key initiatives. There will be no time wasted on data retrieved from multiple sources. Plus, users can access any query with little or no support from the IT department, saving time and money.

3. Data Warehouses Enhance the Consistency and Quality Of Data

Raw data is housed within the data warehouse from different source systems into a common format for the enterprise. Once filtered, each piece of data from the different departments will be standardised and each department will produce results that are in line with the other departments. Data accuracy will become a reality; accurate data is the foundation of many strong business decisions.

4. Historical Intelligence

Data warehouses hold a large amount of historical data that can be analysed to identify trends which can inform business decisions.

5. Enhances ROI

Data warehousing and business intelligence go hand in hand and are proven to generate more revenue and savings as compared to organisations that don’t take advantage of data warehousing and business intelligence.

Image credit: www.infowarets.com
Image credit: www.infowarets.com

Despite the advantages, building and implementing a data warehouse is not a walk in the park and there are always disadvantages to be dealt with.

Disadvantages include:

1. Extra Work On Reporting

Data warehouses risks creating extra work for different departments. This depends on the size of the organisation. Each type of data needed in the warehouse has to be made by the IT teams in each business division. It can be duplicated from an existing database, but in most cases new data gahered from customers or employees needs to be formatted.

2. The Cost and Benefit Ratio

As mentioned earlier, a data warehousing project costs money and requires many hours. For some, it’s just a waste of resources to generate a tool that doesn’t get used enough to justify the expense.

3. Concerns of Data Ownership

Data warehouses are most often, but not always, cloud services, applications or SaaS implementations. Many believe that the data  security in a data warehouse is only as good as the cloud vendor. Even if implemented locally, there will be concerns about data access and ownership. You as a business will have to make sure the people handling analysis processes are trustworthy because any security leak involving your company data would be a nightmare.

4. Flexibility of Data

Data warehouses usually contain static data with minimal ability to operate with specific solutions. Data is imported and then filtered using a schema, but will actually take days or weeks to be used. Other experts also say that data warehouses are subject to ad-hoc queries and are very difficult to streamline for query and processing speed.

The success of data warehousing will depend on solid enterprise integration. It doesn’t matter if you follow Inmon or Kimball data warehouses best practices, proper integration of the organisation’s overall data architecture is the most essential key factor when developing a functioning data warehouse.


Please enter your comment!
Please enter your name here