The Lowdown On Data Warehousing: Architecture, Concepts and Phases 

A data warehouse is the foundation of a successful business intelligence program. The concept of data warehousing is pretty straightforward: to create a central location and a permanent storage space for different data sources needed to support an organisation’s reporting, analysis and other business intelligence functions. It is also designed  to help users understand and enhance their organisation’s performance. Data warehouses are designed for query and analysis rather than transaction processing. It also contains historical data and data from other sources. A data warehouse also separates analysis workload from transaction workload which enable a business organisation to collect and consolidate data from different sources. Data warehousing helps in maintaining historical records and data analysing to improve  and understand the business.

Key characteristics of data warehousing include:

  1. Big amounts of historical data are used
  2. The collected data is structured to be simple, easy access and for high-speed query performance
  3. Data loads involve transformations and multiple sources
  4. Predefined and ad hoc queries are normal
  5. The query retrieves large amount of data
  6. End users can be time sensitive and needs speed-of-thought response times.

Are There Any Difference Between Database, Data Mining and Data Warehousing?

Although the three may seem similar, there are differences between them. Experts say that all data warehouses are databases, but not all databases can be data warehouses.

A database can be defined as any system that keeps data in the table format. It’s a structured collection of records and data that is stored in a computer system where a user or a computer program using a query language can consult the database in answering any queries. The retrieved answers to queries are information that can be used in making decisions. It’s an integrated collection of logically related files that are consolidated into a common pool that generally give data for one or more users.

Data warehousing is a specifically setup database designed to hold big amounts of data for reporting purposes. Data from several systems are merged together to present an enterprise view. Data warehouses also keeps long history from a few years to the whole lifetime of the company to view any long term trends.

Data mining on the other hand is the process of finding patterns in a given data set. The patterns usually provide insightful and meaningful data to whoever is in need or interested in said data. Data mining is applied in many  functions and contexts, such as fraud detection, marketing campaigns and customer research by supermarkets. For example, if you have a credit card and your credit card company alerted you that your card is being fraudulently used by another person other than you. Your credit company has a history of your purchases and has a geographical history of your purchases. If a purchase has been made outside of your usual purchasing routine, you will be alerted. Data mining made this possible.

You also have to remember that before data mining can happen, data warehousing should occur first. All data mining processes rely on the data compiled in the data warehousing stage to get a useful and specific pattern.

Data Warehouse Architectures

There are three types of data warehousing architectures  – basic, with a staging area and with a staging area and data marts.

1. Data Warehouse Architecture: Basic

It’s a simple architecture for data warehouse where end users can directly access data derived from several or different source systems through the data warehouse.

Image credit:
Image credit:

2. Data Warehouse Architecture: with a Staging Area

This type of architecture simplifies the building summaries and general warehouse management.

Image credit:
Image credit:

3. Data Warehouse Architecture: with a Staging Area and Data Marts

When you want to customise your warehouse’s architecture for different groups within your organization, you can do it by adding data marts. Data marts are systems designed for a specific line of business.

Image credit:
Image credit:

Steps In Building A Data Warehouse

In general, building a data warehouse involves these 5 steps:

1. Getting Transactional Data

As mentioned earlier, a large part of having a data warehouse is getting data from different sources and placing it in a central storage area. This can be the hardest part of building a data warehouse because of so much raw data that need to be collected and stored. If you’re building one, you need to know and figure out which database system to use and how can you pull  data from these different sources. There are tools available to use in extracting this data, Microsoft has data Transformation Services that is part of the Microsoft SQL platform and the tool will not cost you anything when you purchase the platform. If in case you don’t have an ODBC or OLE DB compliant data source to work with, you will be bound to make an investment in creating and writing a custom program that will transfer the original data source to the staging database.

2. Transactional Data Transformation

It’s possible that your source systems are made by different IT professionals so it means that each solution will be different from the other. The data model of your mainframe may be different from the model of the client-server system, so to make everything work data should be transformed. Many companies have their data made in a number of different database management systems, plus most of them will be in flat files, mail systems, spreadsheets and other kinds of data stores. When you build a data warehouse, you need to collect these data from these different sources into a type of staging area where you can reshape everything and make it into a single data form for distribution and storing.

3. Making A Dimensional Model

The dimensional model is composed of the fact and dimension tables. The fact tables are made up of foreign keys to each dimension table and measures. Measures are the factual representation of how poorly or how well the business is doing. The dimensions on the other hand, are what the business users except in the details about the measure and the reports. Figuring out and knowing the required dimensions can be made by discussing the business requirements with the users. You also have to take note that not every field you import from the different data source may fit into a dimensional model.

Image credit:
Image credit:

4. Data Loading

Once you have created a dimensional model, you have to populate with data within the staging database. It seems trivial, but it usually involves combining or splitting  several columns or fields. You may need to do some lookups before calculating the values of the dimensional model.

5. Generating Pre-Calculated Summary Values

This step is generally referred to as aggregations. After populating your dimensional database, the SQL server Analysis services will do all the aggregating for you, but take note, it may take a long time thus the size of each dimension will influence the time. It doesn’t matter which dimensional model you use, you have to make sure that your SQL server have as much memory as possible. The more memory you have on your server, the less time it will take in building your aggregate values.

6. Purchasing A Reporting Tool

After having a dimensional database and the necessary aggregations, it’s time for you to have your preferred reporting tool. There are several solutions vendors. Microsoft is one of the preferred suites, most administrators want to use and have. You can build your own a custom reporting tool but it will cost you more and will take a longer time to build and roll out.

Image credit:
Image credit:

Ask experienced administrators, trusted vendors and businesses with their own data warehousing about the fine points of building your own. This way you will further understand if data warehousing is for you or not.


Please enter your comment!
Please enter your name here