Understanding JOIN Clauses in SQL

In a previous article, I demonstrated how we can manipulate and retrieve data from any single table in our database, which covers a wide range of uses. However, this leaves out some of the most important functionality of a relational database – the ability to work with data across multiple tables. JOIN clauses are the missing component that will allow us to incorporate interconnected data in meaningful ways, using more advanced queries. In this article, I’ll be explaining the various types of joins, along with their significance.

Types of JOINs

JOIN clauses combine rows from multiple tables by leveraging matching information in each table. There are four main kinds of JOIN clauses that are useful in different circumstances. They are:

  • LEFT JOIN
  • RIGHT JOIN
  • INNER JOIN
  • FULL JOIN

In order to effectively explain the differences and uses of the above JOIN types, we’ll be working with three example tables that have some test data on orders, and the customers and products that are associated with those orders. To simplify our example, we associate only one product with each order, while real-world orders often involve multiple items.

Orders

idcustomer_idproduct_idsubmitted_at
1112023-06-01 14:20:00
2232023-06-22 12:00:00
342023-06-20 15:30:00
An Orders table containing the IDs of associated customers and products

Customers

idemailfirst_namelast_namephone
1john@example.comJohnDoe123-456-7890
2jane@example.comJaneDoe111-222-3333
3david@example.comDavidSmith999-888-7777
A Customers table containing basic info for each customer

Products

idnamepricecategory
1Headphones199.99Electronics
2Laptop799.99Electronics
3Shirt29.99Apparel
4Shoes99.99Footwear
A Products table containing basic info for each product

Instead of directly containing information about customers and products, each order has an ID that belongs to a record within the customer or product table, meaning we don’t have to store duplicate information.

To return all of the customer and product information relating to a given order, we’ll need to JOIN the tables together. This can be done using the different types of JOIN clauses, depending on what information we want returned.

LEFT JOIN

The LEFT JOIN (also known as a LEFT OUTER JOIN) is a fairly typical way to combine information from multiple tables. As the illustration shows, it specifically returns all data from the original table (in this case labeled A), including data from table B whenever the ON condition is fulfilled.

The ON condition is used to specify how we want our two tables to be joined together. In our example orders table, we have columns containing a product ID and a customer ID. In order to reference customer information when querying the order, we can specify that the customer_id column within the orders table should match the id column within the customers table. This is shown in the following query:

SQL
SELECT orders.id, orders.submitted_at, 
customers.first_name, customers.last_name
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.id;

This query will return the ID and submission date of every order in the orders table, but will also return the first and last name of the customer for orders that have a matching customer_id. If an order does not have a customer_id that matches an ID in the customers table, the customer columns will be NULL (or empty) for that order, as shown in the results:

idsubmitted_atfirst_namelast_name
12023-06-01 14:20:00JohnDoe
22023-06-22 12:00:00JaneDoe
32023-06-20 15:30:00

It can be seen that because order 3 does not have a customer_id, no customer information was returned for that row. Likewise, because no order references customer 3 (David Smith), that customer’s information cannot be seen in the results.

Multiple tables can easily be joined within a single query by adding more JOIN clauses. For example, we can join both the customers and products tables to the orders table to obtain more information about order 2:

idsubmitted_atemailproduct_nameproduct_price
22023-06-22 12:00:00jane@example.comShirt29.99

RIGHT JOIN

The RIGHT JOIN (or RIGHT OUTER JOIN) is very similar to the LEFT JOIN, but is far less frequently used. While both JOIN clauses return all matching rows, the RIGHT JOIN will also return unmatched rows from the joined table instead of the original.

Technically, the RIGHT JOIN provides no additional functionality, as any statement using the RIGHT JOIN can be rewritten using LEFT JOIN clauses. While it can be a situationally useful tool, a LEFT JOIN is almost always considered to be more readable and straightforward.

SQL
SELECT orders.id, orders.submitted_at, 
products.name AS product_name, 
products.price AS product_price
FROM orders
RIGHT JOIN products ON orders.product_id = products.id;
idsubmitted_atproduct_nameproduct_price
12023-06-01 14:20:00Headphones199.99
Laptop799.99
32023-06-22 12:00:00Shirt29.99
42023-06-20 15:30:00Shoes99.99

By using a RIGHT JOIN, we can return the entirety of the products table, including every order that matches a product. Consequently, because no orders exist for the Laptop product, the row containing that product has no order information present.

INNER JOIN

While the LEFT and RIGHT JOIN clauses return certain unmatched rows alongside rows that fulfill the ON condition, the INNER JOIN returns exclusively matching records. When looking at the visual representation of tables A and B, this equates to only the overlapping portion in the center being returned.

INNER JOIN clauses are useful when only looking for records that have a match in the other table – all other records are excluded. To demonstrate this, we can INNER JOIN the customers table to the orders table like so:

SQL
SELECT orders.id, orders.submitted_at, customers.email
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
idsubmitted_atemail
12023-06-01 14:20:00john@example.com
22023-06-22 12:00:00jane@example.com

Using an INNER JOIN, we can see that the query returned only matching records. Order 3 was not returned because it did not reference any customers, and customer 3 was not included in the results because no order references it.

FULL JOIN

The final of the four main types, the FULL JOIN (or FULL OUTER JOIN) includes all records that fulfill the ON condition, but also includes all other records from both tables. Records from both sides that remain unmatched will contain NULL values for columns referring to the other table.

The effects of a FULL JOIN can be demonstrated clearly by using the exact same SQL statement as before, only replacing the INNER JOIN clause with a FULL JOIN clause.

SQL
SELECT orders.id, orders.submitted_at, customers.email
FROM orders
FULL JOIN customers ON orders.customer_id = customers.id;
idsubmitted_atemail
12023-06-01 14:20:00john@example.com
22023-06-22 12:00:00jane@example.com
32023-06-20 15:30:00
david@example.com

In the above example, we can see that the query returned unmatched records for both the orders and customers tables alongside all of the records that matched.

In summary, JOINs are a crucial tool when working with a relational database, allowing for greater insights when working with interconnected data. Mastering the four main types of joins ensures any table structure can be effectively navigated, allowing the full picture of a database to be realized. As always, if you have any questions, please feel free to leave a comment below.

Categories:

Leave a Reply

Your email address will not be published. Required fields are marked *