Here are two tables ‘orders’ and ‘products’ in a tabular format. Here’s how join can be performed in the tables below.
order_id | customer_id | order_date |
---|---|---|
1001 | 2001 | 2022-03-15 |
1002 | 2002 | 2022-03-16 |
1003 | 2003 | 2022-03-17 |
1004 | 2004 | 2022-03-18 |
1005 | 2005 | 2022-03-19 |
product_id | product_name | category | price |
---|---|---|---|
5001 | Phone | Electronics | 699 |
5002 | Headphones | Electronics | 99 |
5003 | Shirt | Clothing | 29 |
5004 | Jeans | Clothing | 49 |
5005 | Socks | Clothing | 9 |
Here are the different types of join in PostgreSQL:
- Inner Join
An Inner Join returns only the matching records from both tables. In other words, it returns the intersection of the two tables. The SQL code for an inner join is as follows:
SELECT *
FROM orders
INNER JOIN products
ON orders.order_id = products.product_id;
The output of the inner join would be:
order_id | customer_id | order_date | product_id | product_name | category | price |
---|---|---|---|---|---|---|
1001 | 2001 | 2022-03-15 | 5001 | Phone | Electronics | 699 |
1002 | 2002 | 2022-03-16 | 5002 | Headphones | Electronics | 99 |
1003 | 2003 | 2022-03-17 | 5003 | Shirt | Clothing | 29 |
1004 | 2004 | 2022-03-18 | 5004 | Jeans | Clothing | 49 |
1005 | 2005 | 2022-03-19 | 5005 | Socks | Clothing | 9 |
- Left Join
A left join returns all the records from the left table and the matching records from the right table. If there is no matching record in the right table, the result will be NULL for those fields. The SQL code for a left join is as follows:
SELECT *
FROM orders
LEFT JOIN products
ON orders.order_id = products.product_id;
The output of the left join would be:
order_id | customer_id | order_date | product_id | product_name | category | price |
---|---|---|---|---|---|---|
1001 | 2001 | 2022-03-15 | 5001 | Phone | Electronics | 699 |
1002 | 2002 | 2022-03-16 | 5002 | Headphones | Electronics | 99 |
1003 | 2003 | 2022-03-17 | 5003 | Shirt | Clothing | 29 |
Here are the SQL statements to create the orders
and products
tables, and insert the sample data provided:
-- Creating the orders table
CREATE TABLE orders (
order_id INT,
product_id INT,
order_date DATE
);
-- Creating the products table
CREATE TABLE products (
product_id INT,
product_name VARCHAR(50),
price NUMERIC(10,2)
);
-- Inserting data into the orders table
INSERT INTO orders (order_id, product_id, order_date)
VALUES (1, 3, '2022-03-15'),
(2, 2, '2022-03-16'),
(3, 4, '2022-03-17'),
(4, 1, '2022-03-18');
-- Inserting data into the products table
INSERT INTO products (product_id, product_name, price)
VALUES (1, 'Shoes', 50),
(2, 'Pants', 25),
(3, 'T-shirt', 15),
(5, 'Hat', 10);
Order By, Group By, Having and Partition By
- ORDER BY:
The ORDER BY clause is used to sort the result set in ascending or descending order based on one or more columns.
Example code:
SELECT * FROM orders
ORDER BY order_date DESC;
Output:
order_id | customer_id | order_date |
---|---|---|
3 | 101 | 2021-03-15 |
1 | 102 | 2021-02-28 |
2 | 103 | 2021-02-20 |
4 | 104 | 2021-01-30 |
In this example, we are selecting all columns from the “orders” table and sorting the result set by the “order_date” column in descending order.
- GROUP BY:
The GROUP BY clause is used to group the result set based on one or more columns.
Example code:
SELECT customer_id, COUNT(*) as total_orders
FROM orders
GROUP BY customer_id;
Output:
customer_id | total_orders |
---|---|
101 | 1 |
102 | 1 |
103 | 1 |
104 | 1 |
In this example, we are grouping the result set by the “customer_id” column and counting the total number of orders for each customer using the COUNT() function.
- HAVING:
The HAVING clause is used to filter the result set based on a condition applied to a grouped column.
Example code:
SELECT customer_id, COUNT(*) as total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;
Output:
customer_id | total_orders |
---|---|
101 | 2 |
In this example, we are grouping the result set by the “customer_id” column and counting the total number of orders for each customer. We are then filtering the result set to only include customers who have placed more than one order using the HAVING clause.
- PARTITION BY:
The PARTITION BY clause is used to divide the result set into partitions based on one or more columns.
Example code:
SELECT order_id, customer_id, order_date, SUM(order_amount) OVER (PARTITION BY customer_id) as total_spent
FROM orders;
Output:
order_id | customer_id | order_date | total_spent |
---|---|---|---|
1 | 102 | 2021-02-28 | 300 |
2 | 103 | 2021-02-20 | 500 |
3 | 101 | 2021-03-15 | 600 |
4 | 104 | 2021-01-30 | 700 |
In this example, we are selecting all columns from the “orders” table and using the SUM() function with the OVER clause and the PARTITION BY clause to calculate the total amount spent by each customer. The result set is divided into partitions based on the “customer_id” column.
Date time functions in SQL
here are some common date and time functions in PostgreSQL and how they can be used with the sample data provided:
1. DATE_TRUNC
DATE_TRUNC
function is used to truncate a date value to a specific precision. This function is useful in grouping date values by time periods such as month, year, week, etc. The syntax for the function is as follows:
DATE_TRUNC(precision, field)
Where precision
can be any of the following:
YEAR
QUARTER
MONTH
WEEK
DAY
HOUR
MINUTE
SECOND
And field
is the input date value.
For example, to group the orders table by month and get the total sales amount for each month, we can use the following query:
SELECT DATE_TRUNC('month', order_date) AS month, SUM(total_price) AS total_sales
FROM orders
GROUP BY 1
ORDER BY 1;
The output will be:
month | total_sales |
---|---|
2021-01-01 | 1200.00 |
2021-02-01 | 500.00 |
2021-03-01 | 1500.00 |
2. EXTRACT
EXTRACT
function is used to extract a specific component from a date value, such as year, month, day, etc. The syntax for the function is as follows:
EXTRACT(field FROM date)
Where field
can be any of the following:
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
For example, to extract the year and month from the order_date column in the orders table, we can use the following query:
SELECT EXTRACT(YEAR FROM order_date) AS year, EXTRACT(MONTH FROM order_date) AS month, total_price
FROM orders;
The output will be:
year | month | total_price |
---|---|---|
2021 | 1 | 1000.00 |
2021 | 1 | 200.00 |
2021 | 2 | 500.00 |
2021 | 3 | 1000.00 |
2021 | 3 | 500.00 |
3. DATE_PART
DATE_PART
function is similar to EXTRACT
, but it allows for more flexibility in specifying the date component. The syntax for the function is as follows:
DATE_PART(field, date)
Where field
can be any of the following:
year
month
day
hour
minute
second
For example, to extract the year and month from the order_date column in the orders table using DATE_PART
, we can use the following query:
SELECT DATE_PART('year', order_date) AS year, DATE_PART('month', order_date) AS month, total_price
FROM orders;
The output will be:
year | month | total_price |
---|---|---|
2021 | 1 | 1000.00 |
2021 | 1 | 200.00 |
2021 | 2 | 500.00 |
2021 | 3 | 100 |
GROUP BY FUNCTIONS
here are some commonly used GROUP BY functions in PostgreSQL with code examples and output using the data above:
- SUM: calculates the sum of values for each group.
Code:
SELECT category, SUM(price) as total_price
FROM products
GROUP BY category;
Output:
category | total_price |
---|---|
Electronics | 1200 |
Clothing | 550 |
Furniture | 500 |
Beauty | 250 |
Sports | 450 |
- AVG: calculates the average of values for each group.
Code:
SELECT category, AVG(price) as avg_price
FROM products
GROUP BY category;
Output:
category | avg_price |
---|---|
Electronics | 400 |
Clothing | 183.33 |
Furniture | 166.67 |
Beauty | 125 |
Sports | 150 |
- MAX: retrieves the maximum value for each group.
Code:
SELECT category, MAX(price) as max_price
FROM products
GROUP BY category;
Output:
category | max_price |
---|---|
Electronics | 700 |
Clothing | 250 |
Furniture | 300 |
Beauty | 150 |
Sports | 250 |
- MIN: retrieves the minimum value for each group.
Code:
SELECT category, MIN(price) as min_price
FROM products
GROUP BY category;
Output:
category | min_price |
---|---|
Electronics | 400 |
Clothing | 100 |
Furniture | 100 |
Beauty | 50 |
Sports | 100 |
- COUNT: counts the number of rows in each group.
Code:
SELECT category, COUNT(*) as count_products
FROM products
GROUP BY category;
Output:
category | count_products |
---|---|
Electronics | 3 |
Clothing | 3 |
Furniture | 3 |
Beauty | 2 |
Sports | 3 |
These are just a few examples of the GROUP BY functions in PostgreSQL. There are many more functions available such as VARIANCE, STDDEV, etc. that can be used to get useful insights from data.
WINDOW FUNCTIONS
Here are some examples of window functions in PostgreSQL using the sample data:
- ROW_NUMBER() The ROW_NUMBER() function assigns a unique number to each row within a partition, with the number starting from 1.
Code:
SELECT order_id, product_name, price, ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY price DESC) AS row_num
FROM order_products;
Output:
order_id | product_name | price | row_num |
---|---|---|---|
1001 | Apples | 2.99 | 1 |
1001 | Bananas | 1.99 | 2 |
1001 | Oranges | 0.99 | 3 |
1002 | Milk | 3.49 | 1 |
1002 | Eggs | 2.99 | 2 |
1002 | Bread | 1.99 | 3 |
1003 | Cheese | 4.99 | 1 |
1003 | Crackers | 2.49 | 2 |
Explanation: The ROW_NUMBER() function partitions the data by order_id and orders it by price in descending order. The row_num column shows the assigned row number for each product within its respective order.
- RANK() The RANK() function assigns a rank to each row within a partition based on its value, with tied values receiving the same rank.
Code:
SELECT order_id, product_name, price, RANK() OVER (PARTITION BY order_id ORDER BY price DESC) AS rank
FROM order_products;
Output:
order_id | product_name | price | rank |
---|---|---|---|
1001 | Apples | 2.99 | 1 |
1001 | Bananas | 1.99 | 2 |
1001 | Oranges | 0.99 | 3 |
1002 | Milk | 3.49 | 1 |
1002 | Eggs | 2.99 | 2 |
1002 | Bread | 1.99 | 3 |
1003 | Cheese | 4.99 | 1 |
1003 | Crackers | 2.49 | 2 |
Explanation: The RANK() function assigns the same rank to products with the same price within their respective order, resulting in ties between Bananas and Oranges for order 1001, and between Eggs and Bread for order 1002.
DENSE_RANK()
: This function assigns a rank to each row within a result set partition, with ties receiving the same rank, and the next rank being the next integer value after the number of distinct ranks. TheDENSE_RANK()
function returns an integer value.
Example:
Suppose we want to assign a rank to each order based on the order date, but we want to handle ties by assigning the same rank to tied orders, and we want to skip ranks between distinct groups of orders. We can use the DENSE_RANK()
function as follows:
SELECT order_id, order_date, DENSE_RANK() OVER (ORDER BY order_date) as dense_rank
FROM orders
ORDER BY dense_rank;
Output:
order_id | order_date | dense_rank |
---|---|---|
1003 | 2022-01-01 | 1 |
1004 | 2022-01-02 | 2 |
1005 | 2022-01-02 | 2 |
1006 | 2022-01-03 | 3 |
1007 | 2022-01-04 | 4 |
1008 | 2022-01-04 | 4 |
1001 | 2022-01-05 | 5 |
1002 | 2022-01-05 | 5 |
1009 | 2022-01-06 | 6 |
In this example, we use the DENSE_RANK()
function with the ORDER BY
clause to assign a rank to each order based on the order date. Since there are two orders with the same date (orders 1004 and 1005), the DENSE_RANK()
function assigns the same rank (2) to both orders, and the next rank (3) is skipped.