Skip to content

7. Joins

Switch to Zen Mode

In relational databases, data is typically distributed across several logically related tables. To create a usable result set, these data sources must be combined within a query. Primary and foreign keys play a crucial role here, as they simplify the connection between records across multiple tables. Linking different tables is a fundamental aspect of the relational data model.

The SELECT statement can be extended in various ways to allow simultaneous access to multiple tables. These queries can reference two or even more tables at once.

The datasets in individual tables represent sets of records (tuples). Using SQL statements, these sets can be combined in different ways.

If two or more union-compatible sets are added, the resulting dataset contains all records from all involved tables. It is also possible to form intersections to identify only those records present in all participating tables. The following three set operations are most commonly used when linking tables:

  • Union (UNION): Combines two or more tables into a single dataset. Duplicates are removed by default, but can be retained using UNION ALL. The union corresponds to the “union” operator in relational algebra.
  • Intersection (INTERSECT): Selects only those records from two or more sets that are identical in all sets. The intersection corresponds to the “intersection” operator in relational algebra.
  • Difference (EXCEPT / MINUS): Determines all records that are contained in one set but not in the other.

Linking multiple tables is referred to as a Join. Joins are used to combine records from two or more tables into a single result set.

To continue with the examples for joins, we use the two tables students and grades. Note that the student “Charlie” has no grades yet, and there is a grade entry (ID 5) with a student_id that does not exist in our students table to better demonstrate how Joins handle missing matches.

student_idfirst_namelast_name
1AliceAdelson
2BobBurger
3CharlieCheck
grade_idstudent_idsubjectscore
11SEW1
21INSY2
32MEDT3
42Math1
599English2

Note: The student_id 99 in the grades table does not exist in the students table, which will be important when we look at how different types of joins handle unmatched records. Yes, this is intentional to illustrate the behavior of joins with unmatched records. This allows us to see how different types of joins (Inner Join, Left Join, Right Join, Full Outer Join) handle cases where there are records in one table that do not have corresponding matches in the other table. It helps to demonstrate the concept of joins more clearly by showing the effects of unmatched records on the resulting dataset. Normally, in a well-maintained database, you would expect foreign key constraints to prevent such discrepancies, but for educational purposes, this setup allows us to explore the behavior of joins in different scenarios.

If you want to test this in your environment, use the following code:

CREATE TABLE students (
student_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE grades (
grade_id INT PRIMARY KEY,
student_id INT,
subject VARCHAR(10),
score INT
);
INSERT INTO students VALUES
(1, 'Alice', 'Adelson'),
(2, 'Bob', 'Burger'),
(3, 'Charlie', 'Check');
INSERT INTO grades VALUES
(1, 1, 'SEW', 1),
(2, 1, 'INSY', 2),
(3, 2, 'MEDT', 3),
(4, 2, 'Math', 1),
(5, 99, 'English', 2);

The Inner Join is the most common type of join. It returns only the records that have matching values in both tables. If a row in the first table does not have a corresponding match in the second table (or vice versa), that row will not appear in the result set.

In our example, we link the students table with the grades table using the student_id.

SELECT s.first_name, s.last_name, g.subject, g.score
FROM students s
INNER JOIN grades g ON s.student_id = g.student_id;
first_namelast_namesubjectscore
AliceAdelsonSEW1
AliceAdelsonINSY2
BobBurgerMEDT3
BobBurgerMath1
  • Charlie Check is missing from the result because he has no entries in the grades table.
  • Grade ID 5 (English) is missing because the student_id 99 does not exist in the students table.
  • The INNER JOIN ensures that the result only contains “complete” data pairs where a student is clearly assigned to a grade.

The Left Join returns all records from the left table (students) and the matched records from the right table (grades).

If there is no match for a record from the left table, the result will still contain that record, but with NULL values for every column of the right table. This is particularly useful when you want to identify entities that have no associated data in a related table (e.g., students who haven’t received any grades yet).

SELECT s.first_name, s.last_name, g.subject, g.score
FROM students s
LEFT JOIN grades g ON s.student_id = g.student_id;
first_namelast_namesubjectscore
AliceAdelsonSEW1
AliceAdelsonINSY2
BobBurgerMEDT3
BobBurgerMath1
CharlieCheckNULLNULL
  • Charlie Check now appears in the list. Since he has no entries in the grades table, the columns subject and score are filled with NULL.
  • Grade ID 5 (English) is still missing. This is because a Left Join prioritizes the left table (students), and there is no student with ID 99 to pull that grade into the result.
  • This join type is essential for reports where you need a complete list of students, regardless of whether they have finished their assignments.

The Right Join is the exact mirror of the Left Join. It returns all records from the right table (grades) and the matched records from the left table (students).

If a record in the right table has no matching partner in the left table, the student columns will be filled with NULL. This is often used to find “orphaned” entries—data that exists in a sub-table but is no longer linked to a main record.

SELECT s.first_name, s.last_name, g.subject, g.score
FROM students s
RIGHT JOIN grades g ON s.student_id = g.student_id;
first_namelast_namesubjectscore
AliceAdelsonSEW1
AliceAdelsonINSY2
BobBurgerMEDT3
BobBurgerMath1
NULLNULLEnglish2

  • The “Orphaned” Grade: Grade ID 5 (English) now appears in the result, even though the student_id 99 doesn’t exist in our students table. The name columns are simply NULL.
  • Charlie Check is gone: Since Charlie has no entry in the grades table, he is excluded because the Right Join prioritizes the right side.
  • Practical Use: In many real-world scenarios, developers prefer using LEFT JOIN and simply switching the table order in the query to keep the logic easier to read from left-to-right.

The Full Outer Join combines the results of both the LEFT JOIN and the RIGHT JOIN. It returns all records when there is a match in either the left table (students) or the right table (grades).

If there is no match, the missing side will contain NULL values. This join provides a complete view of both tables, showing you matched data, students without grades, and grades without students all in one result set.

SELECT s.first_name, s.last_name, g.subject, g.score
FROM students s
FULL OUTER JOIN grades g ON s.student_id = g.student_id;
first_namelast_namesubjectscore
AliceAdelsonSEW1
AliceAdelsonINSY2
BobBurgerMEDT3
BobBurgerMath1
CharlieCheckNULLNULL
NULLNULLEnglish2
  • The “All-In” Approach: This is the only join that includes both Charlie Check (the student without grades) and the English grade (the grade without a student).
  • Usage in Databases: While very powerful, note that some database systems (like MySQL/MariaDB) do not support FULL OUTER JOIN directly. In those cases, you have to simulate it by using a UNION of a LEFT JOIN and a RIGHT JOIN.
  • Data Auditing: This join is excellent for data cleaning or auditing, as it immediately highlights missing relationships on both sides of your database schema.

The Cross Join is the most basic and expansive type of join. It produces a Cartesian Product of the two tables. This means every single row from the first table is combined with every single row from the second table.

Unlike the joins we discussed previously, the Cross Join usually does not use a comparison condition (the ON clause). If Table A has 3 rows and Table B has 5 rows, the result of a Cross Join will be 15 rows (3 × 5).

SELECT s.first_name, s.last_name, g.subject
FROM students s
CROSS JOIN grades g;

(Note: In some SQL dialects, you can achieve the same result by simply listing both tables separated by a comma: SELECT ... FROM students, grades;)

Since the full result would have 15 rows (3 students × 5 grades), here is a sample of what it looks like:

first_namelast_namesubject
AliceAdelsonSEW
AliceAdelsonINSY
AliceAdelsonMEDT
AliceAdelsonMath
AliceAdelsonEnglish
BobBurgerSEW
BobBurgerINSY
CharlieCheckEnglish
  • No Logic Needed: The database doesn’t check if the student_id matches. It simply pairs everyone with everything.
  • Size Explosion: With large tables, a Cross Join can quickly lead to massive result sets that might crash an application or slow down the server.
  • Practical Use: While rarely used for standard reports, Cross Joins are useful for generating test data or creating exhaustive combinations (e.g., matching every possible “Color” with every possible “Size” for a clothing inventory).

A Natural Join is a specialized type of join that automatically joins tables based on all columns that have the same name in both tables.

When using a NATURAL JOIN, you do not need to specify an ON clause because the database “guesses” the relationship by looking for identical column headers.

SELECT first_name, last_name, subject, score
FROM students
NATURAL JOIN grades;
  1. The database looks at both tables (students and grades).
  2. It identifies that both tables have a column named student_id.
  3. It performs an Inner Join using students.student_id = grades.student_id.
  4. Crucially, it only includes the join column (student_id) once in the result set, even if you used SELECT *.
first_namelast_namesubjectscore
AliceAdelsonSEW1
AliceAdelsonINSY2
BobBurgerMEDT3
BobBurgerMath1

Important Warnings (Why it’s rarely used in production):

Section titled “Important Warnings (Why it’s rarely used in production):”

While it looks “cleaner” because the code is shorter, it is often avoided in professional software development for several reasons:

  • Hidden Logic: If someone adds a new column to both tables (e.g., last_update) for internal tracking, the NATURAL JOIN will suddenly try to join on that column too. This will likely break your query or return zero results.
  • Column Name Conflicts: If two tables have a column named id or description that actually mean different things, a Natural Join will incorrectly try to link them.
  • Readability: For a developer reading your code, it’s not immediately clear which columns are being used for the connection without looking up the schema of both tables.

In practice always prefer the explicit INNER JOIN ... ON ... to stay in control of your data.

A Semi-Join is a bit different from the joins we have discussed so far. While standard joins (Inner, Left, etc.) combine columns from both tables, a Semi-Join returns rows from the left table if there is a match in the right table, but it does not append any columns from the right table.

In simple terms: It checks if a record has a partner, but it doesn’t “bring the partner home.”

You use a Semi-Join when you only care if a relationship exists, but you don’t need the actual data from the second table. It also prevents duplicate rows if a student has multiple grades (unlike an Inner Join).

In SQL, there is no SEMI JOIN keyword. Instead, it is usually implemented using the EXISTS operator or the IN operator.

Example - Finding all students who have at least one grade:

SELECT *
FROM students s
WHERE EXISTS (
SELECT 1
FROM grades g
WHERE g.student_id = s.student_id
);
student_idfirst_namelast_name
1AliceAdelson
2BobBurger
  • No Duplicates: Even though Alice has two grades (SEW and INSY), she only appears once in the result. In an INNER JOIN, she would appear twice.
  • No Grade Info: The result only contains columns from the students table. You cannot see the subjects or the scores.
  • The “Anti-Join”: The opposite is the Anti-Join (using NOT EXISTS). This would return only Charlie Check, as he is the only one who does not have a grade.
FeatureInner JoinSemi-Join (EXISTS)
ColumnsFrom both tablesFrom left table only
DuplicatesAlice appears for every gradeAlice appears only once
GoalCombine dataFilter data

An Anti-Join is the logical opposite of a Semi-Join. It returns rows from the left table only if there is no match in the right table.

In our school context, this is the perfect tool to find “orphans”—for example, students who have not yet received any grades or subjects that aren’t assigned to any existing students.

Similar to the Semi-Join, most SQL dialects do not have an ANTI JOIN keyword. It is typically implemented using NOT EXISTS or NOT IN.

Example - Finding students who have no grades assigned:

SELECT *
FROM students s
WHERE NOT EXISTS (
SELECT 1
FROM grades g
WHERE g.student_id = s.student_id
);
student_idfirst_namelast_name
3CharlieCheck

Why not just use a Left Join with a NULL check?

Section titled “Why not just use a Left Join with a NULL check?”

You can actually achieve an Anti-Join result using a LEFT JOIN, but it is often less efficient and slightly harder to read:

SELECT s.*
FROM students s
LEFT JOIN grades g ON s.student_id = g.student_id
WHERE g.grade_id IS NULL;

Benefits of using NOT EXISTS (Anti-Join logic):

  • Performance: The database engine can stop searching for a specific student as soon as it finds a single grade.
  • Clarity: It clearly expresses the intent: “Select students where no grade records exist.”

In real-world database schemas, information is often spread across more than two tables, forming a web of interconnected tables. So, we need to join more than two tables at once.

However, the logic remains the same: each join creates a temporary result set that is then joined with the next table in the sequence.

To demonstrate this, let’s add a third table that links subjects to teachers.

teacher_idnamesubject
101GabrielSEW
102MaurhartINSY
103SteindlMath

When joining multiple tables, you simply append additional JOIN and ON clauses. The order usually follows the logical relationship (Student ➡️ Grade ➡️ Teacher).

SELECT
s.first_name,
s.last_name,
g.subject,
g.score,
t.name AS teacher_name
FROM students s
INNER JOIN grades g ON s.student_id = g.student_id
INNER JOIN teachers t ON g.subject = t.subject;
first_namelast_namesubjectscoreteacher_name
AliceAdelsonSEW1Gabriel
AliceAdelsonINSY2Maurhart
BobBurgerMath1Steindl
  • Logical Flow: The query first connects students to grades via student_id. Once that connection is established, it uses the subject column from the grades table to pull in the matching teacher from the teachers table.
  • The “Filter” Effect: Since we used INNER JOIN for both connections, any record missing a link at any point in the chain disappears. Notice that Bob’s “MEDT” grade is gone because there is no teacher for “MEDT” in our teachers table.
  • Mixing Join Types: You can mix and match. You could use a LEFT JOIN for the first connection (to keep all students) and an INNER JOIN for the second. However, be careful: a subsequent INNER JOIN can accidentally filter out the NULL rows produced by a previous LEFT JOIN.

In relational databases, data is typically distributed across multiple tables. To make this data usable, it must be combined. This chapter distinguishes between two primary methods: Set Operations (based on table structure) and Joins (based on logical relationships via primary and foreign keys).

These are used to combine tables that share an identical structure (union-compatible).

  • UNION: Merges two tables and removes duplicates (e.g., a combined list of projects from Villach and Klagenfurt).
  • INTERSECT: Returns only the commonalities (projects running in both cities).
  • EXCEPT / MINUS: Identifies the difference (projects existing in Villach but not in Klagenfurt).

Joins combine columns from different tables based on a specific condition (usually something like ON s.id = g.student_id).

  • Inner Join: The default case. Only records that have a matching partner in both tables appear in the result.
  • Left Join: Keeps all records from the left table, even if no corresponding data exists on the right (useful for finding students without grades).
  • Right Join: The mirror image of the Left Join—it keeps all records from the right table.
  • Full Outer Join: The “all-in” variant. It displays everything from both tables, filling in missing partners with NULL.
  • Cross Join: Produces a Cartesian product (every row paired with every other row). Caution: This can generate massive datasets!
  • Natural Join: Automatically connects tables via columns with identical names (risky in practice due to hidden logic).
  • Semi-Join & Anti-Join: These are used for filtering. A Semi-Join checks if a partner exists without displaying their data. An Anti-Join specifically finds records without a partner (e.g., students who haven’t received a single grade).

Learning Outcomes: What you should be able to do after this chapter

Section titled “Learning Outcomes: What you should be able to do after this chapter”

After completing this chapter, students should be able to:

  • Distinguish: Know when to use a set operation (UNION, INTERSECT, EXCEPT) versus a JOIN.
  • Apply: Confidently write the correct syntax for INNER, LEFT, and FULL JOIN.
  • Analyze: Predict which records will appear in the result set for each join type (specifically identifying NULL values).
  • Problem Solve: Create targeted queries to find “orphaned” records (e.g., using LEFT JOIN with IS NULL or NOT EXISTS).
  • Optimize: Understand why aliases (e.g., FROM students s) improve readability and why Natural Joins are avoided in professional environments.