Dimensional modeling is a design technique used in data warehousing that organizes data into dimensions and facts to support efficient querying and analysis.

A dimension is a categorical or descriptive attribute of a data set. For example, in a sales data set, dimensions could include date, product, customer, and store. Each dimension has a set of attributes that describe it, such as date attributes like day, month, and year.

A fact is a measurable quantity or event in a data set that can be analyzed. In the sales data set example, facts could include sales revenue, units sold, and discounts given. Each fact has associated measures that can be analyzed, such as revenue by product, units sold by store, or discounts given by customer.

Here is an example of a dimension and fact table in a web format:

Dimension Table: Customer

Customer KeyFirst NameLast NameAddressCityStateZip Code
1JohnSmith123 Main StNew York CityNY10001
2JaneDoe456 Oak StSan FranciscoCA94101
3BobJohnson789 Elm StBostonMA02108

Fact Table: Sales

Date KeyProduct KeyCustomer KeyStore KeyUnits SoldRevenue
2022010111110100.00
20220101221575.00
20220101332220.00
20220102122770.00
2022010221112180.00
20220102331440.00

In the above example, the Customer table is a dimension table with descriptive attributes for each customer, such as their name, address, city, state, and zip code. The Customer Key is a unique identifier for each customer.

The Sales table is a fact table with measurable quantities or events, such as Units Sold and Revenue. It also includes foreign keys to the associated dimension tables, such as Date Key, Product Key, Customer Key, and Store Key. These foreign keys link the fact table to the appropriate dimension table, so that we can analyze sales by various dimensions.

Dimensional modeling is useful because it allows analysts and business users to easily query and analyze data by various dimensions and measures. For example, we can analyze sales by product, date, customer, and store, or we can analyze customer demographics by city, state, and zip code. By organizing data into dimensions and facts, we can easily navigate complex data sets and extract meaningful insights.

  • Natural Key: A natural key is a type of primary key that uses existing data within the table to uniquely identify each record. In the example above, the natural key for the customer dimension is the combination of the customer_name and customer_email columns. This is because this combination is unique for each customer record, and therefore can be used as a reliable way to identify each record.
  • Primary Key: A primary key is a column or set of columns in a table that uniquely identifies each record in the table. In the customer dimension table above, the primary key is the customer_id column. This is because the customer_id column uniquely identifies each record in the table, and is therefore a reliable way to reference each customer record.
  • Unique Key: A unique key is a constraint that ensures that the values in a certain column or set of columns are unique for each record in a table. In the customer dimension table above, the customer_email column is an example of a unique key. This is because each email address should only belong to one customer, and therefore the customer_email column should not contain duplicate values.
  • Surrogate Key: A surrogate key is a unique identifier assigned to each record in a table that does not have any inherent meaning, and is not based on any data in the table. Surrogate keys are often used in data warehouses and dimensional modeling to simplify the process of referencing records across different tables. In the customer dimension table above, the customer_id column is an example of a surrogate key, because it is a unique identifier assigned to each customer record that does not have any inherent meaning.
  • Foreign Key:
    • In the sales_fact table, the foreign keys are customer_id, product_id, and date_id. These columns reference the corresponding primary keys in the customer, product, and date dimension tables, respectively.
    • For example, the customer_id foreign key in the sales_fact table is used to reference the customer_id primary key in the customer dimension table. This allows us to join the two tables together and analyze the sales data at a customer level.
    • Similarly, the product_id and date_id foreign keys in the sales_fact table reference the product_id and date_id primary keys in their respective dimension tables. This allows us to join the tables together and analyze the sales data at a product and date level, respectively.