sub-title

Also check Orama's Quora and Orama's GitHub
I shall not claim to know so much, but only that I learn new things everyday

Monday, 27 June 2022

Join the Data Warehouse Bandwagon Now

Introduction

An operational database used by a business is optimized for quick updates due to the fact that transactions that occur on a day-to-day basis. This is referred to as OLTP (Online Transaction Processing) database.

OLTP database is not suitable for business intelligence or analysis for many reasons, primarily:

  1. The fact that it is highly normalized, hence the queries are complex to write, and they run inefficiently; and

  2. Secondly, it has multiple concurrent users. Therefore, running business intelligence or analysis tools on an operational database would exert additional strain on the database - hence slowing it down, or even bringing it down if using very heavy queries; and

  3. Thirdly, operational databases may be disparate and may not communicate well in their original form, making analysis difficult.


Therefore, businesses need to find a way of separating analytical systems (OLAP) from transactional systems (OLTP). This is where the data warehouse comes in.

Some business choose to simply create an offline copy of the operational databases and use this copy to perform business intelligence. In such a case, the offline database only removes the stress away from the operational database, but still suffers heavily in other respects.


The advent of Data Warehouse

The DW is an important component of Business Intelligence (BI). It is just a database derived from operational databases (source transaction systems) for purposes of analysis and reporting. DW is a central repository of current and historical data from disparate databases. A DW makes it easy to write queries as it does not have complex normalization compared to the operational database.

A DW is a single data repository where multiple data sources are integrated for online analytical processing (OLAP).

Where a DW focuses on only one function or department within an organization, it is instead referred to as a Data Mart.


Designing a DW

The process of designing a DW involves:

  1. Requirements specification: identifying business needs from users and stakeholders to come up with a requirements specification document. This is similar to the Requirements Specification of the software development process.

  2. Modelling: organizing the business needs in terms of facts and dimensions, and relationships. Facts are business measures or metrics, while dimensions are the business categories or references used for analysis of those facts. Star, Snowflake and Galaxy models are the three main modelling techniques used in DW. This stage is similar to the Design stage in software development life cycle.

  3. Extract, Transform, Load (ETL) or its variant ELT: Extract data from various sources and put them in a data lake. Then Transform and integrate the data before loading into the DW. Some ETL tools are SSIS, SAS Data Management, IBM Information Server, Hive, etc. In this stage, the ETL pipeline is developed using an appropriate ETL tool.

  4. OLAP cubes: OLAP cube (or multidimensional cube or hypercube) is a data structure optimized for fast data analysis by multiple dimensions that define a business problem.

  5. User Interface (UI): to allow the end-user to use the DW, we can use front-end UI that is developed based on tools such as Tableau, PowerBI, BigQuery, etc.

  6. Testing and Deployment: The DW needs to be tested before deployment. Testing ensures that user requirements are met by the system, and that there are no bugs. Once users have accepted the system, then it can be deployed.

  7. Training: Users need to be trained in using the system, specifically the front-end UI by end-users. Also, technical teams may be trained in system maintenance.

  8. Maintenance: On an ongoing basis, we may need to update the data model, UI applications, etc to meet new user requirements.



Updating a DW

Data is uploaded into a DW from operational systems (OLTP), which are - by design - not optimized for reporting and analysis. In a DW, data is optimized for analysis and reporting (OLAP).

A DW is built using two approaches:

  1. Extract, Transform, Load (ETL) and
  2. Extract, Load, Transform (ELT)


In ETL, there is a separate staging area where data cleaning and transformation is done before loading into the DW. The staging database stores raw data extracted from each of the disparate source data systems. The raw data from disparate sources is then transformed, integrated, and finally loaded into the DW.

In the case of ELT, data if first loaded (as-is) into the DW. It is then transformed and integrated inside the staging area, which is inside the data warehouse itself.

Data Warehouse can be accessed by end-users using Business Intelligence tools (reporting frameworks, visualization), SQL clients, spreadsheets, etc.


Conclusion

In this era, there is no way a business can continue to strive while using a transactional (or OLTP) database for efficient business intelligence purposes given the cost of doing so (refer to Introduction). A DW provides the solution to the challenge, and it is easy to be accomplished by a team of data professionals.

Moreover, as many businesses move to the cloud, they can now take advantage of cloud-based DW and analysis tools such as Amazon Redshift (PostgreSQL), Microsoft Azure SQL Data Warehouse (SQL Server), Google BigQuery (
ANSI SQL), Snowflake, etc.

Snowflake is a fully managed Massively Parallel Processing (MPP) cloud-based data warehouse that runs on AWS, GCP, and Azure; and it is the only solution that doesn't run on its own cloud.

It is now easier than ever before to leverage cloud technologies in the DW space.
The cloud solutions will vary in terms of pricing, scalability, architecture, security features, speed, etc.

You either join the DW bandwagon or ship out.

No comments:

Post a Comment