SCD Type 2 (Slowly Changing Dimension Type 2) is a technique used in dimensional modeling to manage changes to dimension data over time. This technique is used to maintain historical records of changes to the dimension data, while also keeping track of the current state of the data.
In SCD Type 2, a new row is added to the dimension table every time there is a change to the dimension data. The new row contains the updated values of the dimension attributes, along with a new surrogate key and a start and end date that specify the validity period of the row. The previous row is marked as inactive by setting the end date to the date of the change.
customer_id | first_name | last_name | address | city | state | zip_code | effective_date | end_date |
---|---|---|---|---|---|---|---|---|
1 | John | Smith | 123 Main St | New York | NY | 10001 | 01-Jan-2022 | 31-Jan-2022 |
2 | Jane | Doe | 456 Elm St | Los Angeles | CA | 90001 | 01-Jan-2022 | 31-Dec-9999 |
3 | John | Smith | 789 Broadway St | New York | NY | 10001 | 01-Feb-2022 | 31-Dec-9999 |
In this data given above, each column represents a unique customer attribute, and each row represents a unique customer record with a unique customer_id
. The effective_date
and end_date
columns indicate the time period when each record was active.
For example, the first record for John Smith shows that he lived at 123 Main St in New York City with a zip code of 10001, and that this record was effective from January 1, 2022, to January 31, 2022. The third record for John Smith shows that he lived at 789 Broadway St in New York City with the same zip code, but that this record was effective from February 1, 2022, onwards, indicating that John moved to a new address. The end_date
of the first record and the effective_date
of the third record form a “gap” where there was no active record for John Smith during the month of February 2022.