Dimensional modeling is a design technique used in data warehousing that organizes data into dimensions and facts to support efficient querying and analysis.
A dimension is a categorical or descriptive attribute of a data set. For example, in a sales data set, dimensions could include date, product, customer, and store. Each dimension has a set of attributes that describe it, such as date attributes like day, month, and year.
A fact is a measurable quantity or event in a data set that can be analyzed. In the sales data set example, facts could include sales revenue, units sold, and discounts given. Each fact has associated measures that can be analyzed, such as revenue by product, units sold by store, or discounts given by customer.
Here is an example of a dimension and fact table in a web format:
Dimension Table: Customer
Customer Key | First Name | Last Name | Address | City | State | Zip Code |
---|---|---|---|---|---|---|
1 | John | Smith | 123 Main St | New York City | NY | 10001 |
2 | Jane | Doe | 456 Oak St | San Francisco | CA | 94101 |
3 | Bob | Johnson | 789 Elm St | Boston | MA | 02108 |
Fact Table: Sales
Date Key | Product Key | Customer Key | Store Key | Units Sold | Revenue |
---|---|---|---|---|---|
20220101 | 1 | 1 | 1 | 10 | 100.00 |
20220101 | 2 | 2 | 1 | 5 | 75.00 |
20220101 | 3 | 3 | 2 | 2 | 20.00 |
20220102 | 1 | 2 | 2 | 7 | 70.00 |
20220102 | 2 | 1 | 1 | 12 | 180.00 |
20220102 | 3 | 3 | 1 | 4 | 40.00 |
In the above example, the Customer table is a dimension table with descriptive attributes for each customer, such as their name, address, city, state, and zip code. The Customer Key is a unique identifier for each customer.
The Sales table is a fact table with measurable quantities or events, such as Units Sold and Revenue. It also includes foreign keys to the associated dimension tables, such as Date Key, Product Key, Customer Key, and Store Key. These foreign keys link the fact table to the appropriate dimension table, so that we can analyze sales by various dimensions.
Dimensional modeling is useful because it allows analysts and business users to easily query and analyze data by various dimensions and measures. For example, we can analyze sales by product, date, customer, and store, or we can analyze customer demographics by city, state, and zip code. By organizing data into dimensions and facts, we can easily navigate complex data sets and extract meaningful insights.
- Natural Key: A natural key is a type of primary key that uses existing data within the table to uniquely identify each record. In the example above, the natural key for the
customer
dimension is the combination of thecustomer_name
andcustomer_email
columns. This is because this combination is unique for each customer record, and therefore can be used as a reliable way to identify each record. - Primary Key: A primary key is a column or set of columns in a table that uniquely identifies each record in the table. In the
customer
dimension table above, the primary key is thecustomer_id
column. This is because thecustomer_id
column uniquely identifies each record in the table, and is therefore a reliable way to reference each customer record. - Unique Key: A unique key is a constraint that ensures that the values in a certain column or set of columns are unique for each record in a table. In the
customer
dimension table above, thecustomer_email
column is an example of a unique key. This is because each email address should only belong to one customer, and therefore thecustomer_email
column should not contain duplicate values. - Surrogate Key: A surrogate key is a unique identifier assigned to each record in a table that does not have any inherent meaning, and is not based on any data in the table. Surrogate keys are often used in data warehouses and dimensional modeling to simplify the process of referencing records across different tables. In the
customer
dimension table above, thecustomer_id
column is an example of a surrogate key, because it is a unique identifier assigned to each customer record that does not have any inherent meaning. - Foreign Key:
- In the
sales_fact
table, the foreign keys arecustomer_id
,product_id
, anddate_id
. These columns reference the corresponding primary keys in thecustomer
,product
, anddate
dimension tables, respectively. - For example, the
customer_id
foreign key in thesales_fact
table is used to reference thecustomer_id
primary key in thecustomer
dimension table. This allows us to join the two tables together and analyze the sales data at a customer level. - Similarly, the
product_id
anddate_id
foreign keys in thesales_fact
table reference theproduct_id
anddate_id
primary keys in their respective dimension tables. This allows us to join the tables together and analyze the sales data at a product and date level, respectively.
- In the