Let’s say there are two data frames orders and order_products:

Python
import polars as pl

# Load the Orders dataset into a Polars DataFrame
orders_df = pl.read_csv("orders.csv")

# Load the Order Products dataset into a Polars DataFrame
order_products_df = pl.read_csv("order_products.csv")

orders.csv:

order_iduser_idorder_numberorder_doworder_hour_of_daydays_since_prior_order
11121084410NaN
27943181915.0
342756661721.0
41722721929.0
556463761428.0

order_products.csv:

order_idproduct_idadd_to_cart_orderreordered
14930211
11110921
11024630
14968340
14363351

Inner join:

Python
#Inner join example
orders_products_inner_join = orders_df.inner_join(order_products_df, left_on='order_id', right_on='order_id')
order_iduser_idorder_numberorder_dateproduct_idquantity
101100112022-03-1520012
101100112022-03-1520023
102100122022-03-1720031
102100122022-03-1720042
103100212022-03-1820021
103100212022-03-1820033
103100212022-03-1820042

Left join:

Python

# Left join the two datasets on the 'order_id' column
orders_products_left_join = orders_df.left_join(order_products_df, left_on='order_id', right_on='order_id')

Output for left join:

order_iduser_idorder_numberorder_dateproduct_idquantity
101100112022-03-1520012
101100112022-03-1520023
102100122022-03-1720031
102100122022-03-1720042
103100212022-03-1820021
103100212022-03-1820033
103100212022-03-1820042
104100312022-03-20NoneNone

Polars is a powerful DataFrame library in Rust programming language, which can handle large amounts of data and process them very efficiently. Polars supports various types of joins like inner join, left join, right join, and outer join. In this section, we will discuss some advanced techniques to perform joins in Polars.

  1. Joining on Multiple Columns:

Polars supports joining on multiple columns simultaneously. To join on multiple columns, we can pass a list of column names to the left_on and right_on arguments instead of a single column name. For example, to join two dataframes on columns col1 and col2, we can use the following code:

import polars as pl

df1 = pl.DataFrame({'col1': [1, 2, 3, 4], 'col2': ['a', 'b', 'c', 'd']})
df2 = pl.DataFrame({'col1': [1, 2, 4, 5], 'col2': ['a', 'b', 'd', 'e']})

joined_df = df1.join(df2, left_on=['col1', 'col2'], right_on=['col1', 'col2'])

This will perform an inner join on columns col1 and col2 of both dataframes.

  1. Joining with Different Join Types:

By default, Polars performs an inner join when we call the join() method. However, we can perform different types of joins by passing the how argument. The different types of joins that Polars supports are inner, left, right, and outer.

import polars as pl

df1 = pl.DataFrame({'col1': [1, 2, 3, 4], 'col2': ['a', 'b', 'c', 'd']})
df2 = pl.DataFrame({'col1': [1, 2, 4, 5], 'col3': ['x', 'y', 'z', 'w']})

# Inner Join
joined_df = df1.join(df2, left_on='col1', right_on='col1', how='inner')

# Left Join
joined_df = df1.join(df2, left_on='col1', right_on='col1', how='left')

# Right Join
joined_df = df1.join(df2, left_on='col1', right_on='col1', how='right')

# Outer Join
joined_df = df1.join(df2, left_on='col1', right_on='col1', how='outer')