Wednesday, August 4, 2010

Data Warehousing

A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis.

It is a relational database designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources

Definition:

A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The term was coined by W. H. Inmon.

Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries.

Why do I need a data warehouse?

1. The means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are considered essential components of a data warehousing system.

2. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

3. To integrate data across functions or systems to provide a complete picture of the data subject e.g. customer orders, customer complaints, salespersons. To do this on the fly or run, would be time coming and performance of your BI system would be poor.

4. To avoid interference with the fast performing transaction systems by running large computer resource queries and reports whilst routine users and possibly customers are executing the essential business transactions.

5. To reorganize the data to support fast reporting and querying.

6. To clean up the quality of the data to give consistency and data integrity. Many systems do not have strict input validation and garbage gets in ... duplicates e.g. same customer entered more than once. Also there often different definitions for the same subject or entity within the business e.g. customer, client, prospect.

Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only.

No comments:

Post a Comment