Thursday, August 5, 2010

Data Warehouse Dimensional Model Components

Data Warehouse Dimensional Model Components

After the data warehousing architect locates all data elements necessary to support the data warehouse, it is time to build a dimensional model. Dimensional model is equivalent of logical data design of Data Warehouse,

Dimensional Modeling Concept

Detailed data analysis of a single business process is performed to identify the fact table granularity, associated dimensions and attributes, and numeric facts. Dimensional models contain the same data content and relationships as models normalized into third normal form, but structured differently. To Improve understandability and query performance required by DW/BI.

Primary constructs of a dimensional model

fact tables

dimension tables

Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact table.

Fact tables

A fact table is a table that contains the measures of interest A fact table is a table that stores facts that measure the business, such as sales, cost of goods, or profit. Fact tables also contain foreign keys to the dimension tables. These foreign keys relate each row of data in the fact table to its corresponding dimensions and levels.

Dimension tables

A dimension table is a table that stores attributes that describe aspects of a dimension. For example, a time table stores the various aspects of time such as year, quarter, month, and day. A foreign key of a fact table references the primary key in a dimension table in a many-to-one relationship.

Star Schema
Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema

· Identify a business process for analysis(like sales).

· Identify measures or facts (dollar sold).

· Identify dimensions for facts(time dimension, location dimension, item dimension, branch dimension).

· List the columns that describe each dimension.(item name, brand, country).

· Determine the lowest level of summary in a fact table(sales dollar).


Star Schema

Wednesday, August 4, 2010

Data Warehousing

A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis.

It is a relational database designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources

Definition:

A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The term was coined by W. H. Inmon.

Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries.

Why do I need a data warehouse?

1. The means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are considered essential components of a data warehousing system.

2. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

3. To integrate data across functions or systems to provide a complete picture of the data subject e.g. customer orders, customer complaints, salespersons. To do this on the fly or run, would be time coming and performance of your BI system would be poor.

4. To avoid interference with the fast performing transaction systems by running large computer resource queries and reports whilst routine users and possibly customers are executing the essential business transactions.

5. To reorganize the data to support fast reporting and querying.

6. To clean up the quality of the data to give consistency and data integrity. Many systems do not have strict input validation and garbage gets in ... duplicates e.g. same customer entered more than once. Also there often different definitions for the same subject or entity within the business e.g. customer, client, prospect.

Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only.