7. Joins
Joins - Working with Multiple Tables
Section titled “Joins - Working with Multiple Tables”Data Across Multiple Tables
Section titled “Data Across Multiple Tables”In relational databases, data is typically distributed across several logically related tables. To create a usable result set, these data sources must be combined within a query. Primary and foreign keys play a crucial role here, as they simplify the connection between records across multiple tables. Linking different tables is a fundamental aspect of the relational data model.
The SELECT statement can be extended in various ways to allow simultaneous access to multiple tables. These queries can reference two or even more tables at once.
Linking Tables via Set Operations
Section titled “Linking Tables via Set Operations”The datasets in individual tables represent sets of records (tuples). Using SQL statements, these sets can be combined in different ways.
If two or more union-compatible sets are added, the resulting dataset contains all records from all involved tables. It is also possible to form intersections to identify only those records present in all participating tables. The following three set operations are most commonly used when linking tables:
- Union (
UNION): Combines two or more tables into a single dataset. Duplicates are removed by default, but can be retained usingUNION ALL. The union corresponds to the “union” operator in relational algebra.
- Intersection (
INTERSECT): Selects only those records from two or more sets that are identical in all sets. The intersection corresponds to the “intersection” operator in relational algebra.
- Difference (
EXCEPT/MINUS): Determines all records that are contained in one set but not in the other.
Joining Tables (Joins)
Section titled “Joining Tables (Joins)”Linking multiple tables is referred to as a Join. Joins are used to combine records from two or more tables into a single result set.
To continue with the examples for joins, we use the two tables students and grades. Note that the student “Charlie” has no grades yet, and there is a grade entry (ID 5) with a student_id that does not exist in our students table to better demonstrate how Joins handle missing matches.
Example Tables
Section titled “Example Tables”Table: students
Section titled “Table: students”| student_id | first_name | last_name |
|---|---|---|
| 1 | Alice | Adelson |
| 2 | Bob | Burger |
| 3 | Charlie | Check |
Table: grades
Section titled “Table: grades”| grade_id | student_id | subject | score |
|---|---|---|---|
| 1 | 1 | SEW | 1 |
| 2 | 1 | INSY | 2 |
| 3 | 2 | MEDT | 3 |
| 4 | 2 | Math | 1 |
| 5 | 99 | English | 2 |
Note: The student_id 99 in the grades table does not exist in the students table, which will be important when we look at how different types of joins handle unmatched records. Yes, this is intentional to illustrate the behavior of joins with unmatched records. This allows us to see how different types of joins (Inner Join, Left Join, Right Join, Full Outer Join) handle cases where there are records in one table that do not have corresponding matches in the other table. It helps to demonstrate the concept of joins more clearly by showing the effects of unmatched records on the resulting dataset. Normally, in a well-maintained database, you would expect foreign key constraints to prevent such discrepancies, but for educational purposes, this setup allows us to explore the behavior of joins in different scenarios.
SQL Setup
Section titled “SQL Setup”If you want to test this in your environment, use the following code:
CREATE TABLE students ( student_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50));
CREATE TABLE grades ( grade_id INT PRIMARY KEY, student_id INT, subject VARCHAR(10), score INT);
INSERT INTO students VALUES(1, 'Alice', 'Adelson'),(2, 'Bob', 'Burger'),(3, 'Charlie', 'Check');
INSERT INTO grades VALUES(1, 1, 'SEW', 1),(2, 1, 'INSY', 2),(3, 2, 'MEDT', 3),(4, 2, 'Math', 1),(5, 99, 'English', 2);Inner Join
Section titled “Inner Join”The Inner Join is the most common type of join. It returns only the records that have matching values in both tables. If a row in the first table does not have a corresponding match in the second table (or vice versa), that row will not appear in the result set.
In our example, we link the students table with the grades table using the student_id.
SQL Syntax
Section titled “SQL Syntax”SELECT s.first_name, s.last_name, g.subject, g.scoreFROM students sINNER JOIN grades g ON s.student_id = g.student_id;Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score |
|---|---|---|---|
| Alice | Adelson | SEW | 1 |
| Alice | Adelson | INSY | 2 |
| Bob | Burger | MEDT | 3 |
| Bob | Burger | Math | 1 |
Key Observations:
Section titled “Key Observations:”- Charlie Check is missing from the result because he has no entries in the
gradestable. - Grade ID 5 (English) is missing because the
student_id99 does not exist in thestudentstable. - The
INNER JOINensures that the result only contains “complete” data pairs where a student is clearly assigned to a grade.
Left Join (Left Outer Join)
Section titled “Left Join (Left Outer Join)”The Left Join returns all records from the left table (students) and the matched records from the right table (grades).
If there is no match for a record from the left table, the result will still contain that record, but with NULL values for every column of the right table. This is particularly useful when you want to identify entities that have no associated data in a related table (e.g., students who haven’t received any grades yet).
SQL Syntax
Section titled “SQL Syntax”SELECT s.first_name, s.last_name, g.subject, g.scoreFROM students sLEFT JOIN grades g ON s.student_id = g.student_id;Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score |
|---|---|---|---|
| Alice | Adelson | SEW | 1 |
| Alice | Adelson | INSY | 2 |
| Bob | Burger | MEDT | 3 |
| Bob | Burger | Math | 1 |
| Charlie | Check | NULL | NULL |
Key Observations:
Section titled “Key Observations:”- Charlie Check now appears in the list. Since he has no entries in the
gradestable, the columnssubjectandscoreare filled withNULL. - Grade ID 5 (English) is still missing. This is because a Left Join prioritizes the left table (
students), and there is no student with ID 99 to pull that grade into the result. - This join type is essential for reports where you need a complete list of students, regardless of whether they have finished their assignments.
Right Join (Right Outer Join)
Section titled “Right Join (Right Outer Join)”The Right Join is the exact mirror of the Left Join. It returns all records from the right table (grades) and the matched records from the left table (students).
If a record in the right table has no matching partner in the left table, the student columns will be filled with NULL. This is often used to find “orphaned” entries—data that exists in a sub-table but is no longer linked to a main record.
SQL Syntax
Section titled “SQL Syntax”SELECT s.first_name, s.last_name, g.subject, g.scoreFROM students sRIGHT JOIN grades g ON s.student_id = g.student_id;Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score |
|---|---|---|---|
| Alice | Adelson | SEW | 1 |
| Alice | Adelson | INSY | 2 |
| Bob | Burger | MEDT | 3 |
| Bob | Burger | Math | 1 |
| NULL | NULL | English | 2 |
Key Observations:
Section titled “Key Observations:”- The “Orphaned” Grade: Grade ID 5 (English) now appears in the result, even though the
student_id99 doesn’t exist in ourstudentstable. The name columns are simplyNULL. - Charlie Check is gone: Since Charlie has no entry in the
gradestable, he is excluded because the Right Join prioritizes the right side. - Practical Use: In many real-world scenarios, developers prefer using
LEFT JOINand simply switching the table order in the query to keep the logic easier to read from left-to-right.
Full Outer Join
Section titled “Full Outer Join”The Full Outer Join combines the results of both the LEFT JOIN and the RIGHT JOIN. It returns all records when there is a match in either the left table (students) or the right table (grades).
If there is no match, the missing side will contain NULL values. This join provides a complete view of both tables, showing you matched data, students without grades, and grades without students all in one result set.
SQL Syntax
Section titled “SQL Syntax”SELECT s.first_name, s.last_name, g.subject, g.scoreFROM students sFULL OUTER JOIN grades g ON s.student_id = g.student_id;Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score |
|---|---|---|---|
| Alice | Adelson | SEW | 1 |
| Alice | Adelson | INSY | 2 |
| Bob | Burger | MEDT | 3 |
| Bob | Burger | Math | 1 |
| Charlie | Check | NULL | NULL |
| NULL | NULL | English | 2 |
Key Observations:
Section titled “Key Observations:”- The “All-In” Approach: This is the only join that includes both Charlie Check (the student without grades) and the English grade (the grade without a student).
- Usage in Databases: While very powerful, note that some database systems (like MySQL/MariaDB) do not support
FULL OUTER JOINdirectly. In those cases, you have to simulate it by using aUNIONof aLEFT JOINand aRIGHT JOIN. - Data Auditing: This join is excellent for data cleaning or auditing, as it immediately highlights missing relationships on both sides of your database schema.
Cross Join
Section titled “Cross Join”The Cross Join is the most basic and expansive type of join. It produces a Cartesian Product of the two tables. This means every single row from the first table is combined with every single row from the second table.
Unlike the joins we discussed previously, the Cross Join usually does not use a comparison condition (the ON clause). If Table A has 3 rows and Table B has 5 rows, the result of a Cross Join will be 15 rows (3 × 5).
SQL Syntax
Section titled “SQL Syntax”SELECT s.first_name, s.last_name, g.subjectFROM students sCROSS JOIN grades g;(Note: In some SQL dialects, you can achieve the same result by simply listing both tables separated by a comma: SELECT ... FROM students, grades;)
Resulting Table (Excerpt)
Section titled “Resulting Table (Excerpt)”Since the full result would have 15 rows (3 students × 5 grades), here is a sample of what it looks like:
| first_name | last_name | subject |
|---|---|---|
| Alice | Adelson | SEW |
| Alice | Adelson | INSY |
| Alice | Adelson | MEDT |
| Alice | Adelson | Math |
| Alice | Adelson | English |
| Bob | Burger | SEW |
| Bob | Burger | INSY |
| … | … | … |
| Charlie | Check | English |
Key Observations:
Section titled “Key Observations:”- No Logic Needed: The database doesn’t check if the
student_idmatches. It simply pairs everyone with everything. - Size Explosion: With large tables, a Cross Join can quickly lead to massive result sets that might crash an application or slow down the server.
- Practical Use: While rarely used for standard reports, Cross Joins are useful for generating test data or creating exhaustive combinations (e.g., matching every possible “Color” with every possible “Size” for a clothing inventory).
Natural Join
Section titled “Natural Join”A Natural Join is a specialized type of join that automatically joins tables based on all columns that have the same name in both tables.
When using a NATURAL JOIN, you do not need to specify an ON clause because the database “guesses” the relationship by looking for identical column headers.
SQL Syntax
Section titled “SQL Syntax”SELECT first_name, last_name, subject, scoreFROM studentsNATURAL JOIN grades;How it works in our example:
Section titled “How it works in our example:”- The database looks at both tables (
studentsandgrades). - It identifies that both tables have a column named
student_id. - It performs an Inner Join using
students.student_id = grades.student_id. - Crucially, it only includes the join column (
student_id) once in the result set, even if you usedSELECT *.
Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score |
|---|---|---|---|
| Alice | Adelson | SEW | 1 |
| Alice | Adelson | INSY | 2 |
| Bob | Burger | MEDT | 3 |
| Bob | Burger | Math | 1 |
Important Warnings (Why it’s rarely used in production):
Section titled “Important Warnings (Why it’s rarely used in production):”While it looks “cleaner” because the code is shorter, it is often avoided in professional software development for several reasons:
- Hidden Logic: If someone adds a new column to both tables (e.g.,
last_update) for internal tracking, theNATURAL JOINwill suddenly try to join on that column too. This will likely break your query or return zero results. - Column Name Conflicts: If two tables have a column named
idordescriptionthat actually mean different things, a Natural Join will incorrectly try to link them. - Readability: For a developer reading your code, it’s not immediately clear which columns are being used for the connection without looking up the schema of both tables.
In practice always prefer the explicit INNER JOIN ... ON ... to stay in control of your data.
Semi-Join
Section titled “Semi-Join”A Semi-Join is a bit different from the joins we have discussed so far. While standard joins (Inner, Left, etc.) combine columns from both tables, a Semi-Join returns rows from the left table if there is a match in the right table, but it does not append any columns from the right table.
In simple terms: It checks if a record has a partner, but it doesn’t “bring the partner home.”
Why use it?
Section titled “Why use it?”You use a Semi-Join when you only care if a relationship exists, but you don’t need the actual data from the second table. It also prevents duplicate rows if a student has multiple grades (unlike an Inner Join).
SQL Syntax
Section titled “SQL Syntax”In SQL, there is no SEMI JOIN keyword. Instead, it is usually implemented using the EXISTS operator or the IN operator.
Example - Finding all students who have at least one grade:
SELECT *FROM students sWHERE EXISTS ( SELECT 1 FROM grades g WHERE g.student_id = s.student_id);Resulting Table
Section titled “Resulting Table”| student_id | first_name | last_name |
|---|---|---|
| 1 | Alice | Adelson |
| 2 | Bob | Burger |
Key Observations:
Section titled “Key Observations:”- No Duplicates: Even though Alice has two grades (SEW and INSY), she only appears once in the result. In an
INNER JOIN, she would appear twice. - No Grade Info: The result only contains columns from the
studentstable. You cannot see the subjects or the scores. - The “Anti-Join”: The opposite is the Anti-Join (using
NOT EXISTS). This would return only Charlie Check, as he is the only one who does not have a grade.
Comparison: Inner Join vs. Semi-Join
Section titled “Comparison: Inner Join vs. Semi-Join”| Feature | Inner Join | Semi-Join (EXISTS) |
|---|---|---|
| Columns | From both tables | From left table only |
| Duplicates | Alice appears for every grade | Alice appears only once |
| Goal | Combine data | Filter data |
Anti-Join
Section titled “Anti-Join”An Anti-Join is the logical opposite of a Semi-Join. It returns rows from the left table only if there is no match in the right table.
In our school context, this is the perfect tool to find “orphans”—for example, students who have not yet received any grades or subjects that aren’t assigned to any existing students.
SQL Syntax
Section titled “SQL Syntax”Similar to the Semi-Join, most SQL dialects do not have an ANTI JOIN keyword. It is typically implemented using NOT EXISTS or NOT IN.
Example - Finding students who have no grades assigned:
SELECT *FROM students sWHERE NOT EXISTS ( SELECT 1 FROM grades g WHERE g.student_id = s.student_id);Resulting Table
Section titled “Resulting Table”| student_id | first_name | last_name |
|---|---|---|
| 3 | Charlie | Check |
Why not just use a Left Join with a NULL check?
Section titled “Why not just use a Left Join with a NULL check?”You can actually achieve an Anti-Join result using a LEFT JOIN, but it is often less efficient and slightly harder to read:
SELECT s.*FROM students sLEFT JOIN grades g ON s.student_id = g.student_idWHERE g.grade_id IS NULL;Benefits of using NOT EXISTS (Anti-Join logic):
- Performance: The database engine can stop searching for a specific student as soon as it finds a single grade.
- Clarity: It clearly expresses the intent: “Select students where no grade records exist.”
Joining More Than Two Tables
Section titled “Joining More Than Two Tables”In real-world database schemas, information is often spread across more than two tables, forming a web of interconnected tables. So, we need to join more than two tables at once.
However, the logic remains the same: each join creates a temporary result set that is then joined with the next table in the sequence.
Example Extension: Table teachers
Section titled “Example Extension: Table teachers”To demonstrate this, let’s add a third table that links subjects to teachers.
| teacher_id | name | subject |
|---|---|---|
| 101 | Gabriel | SEW |
| 102 | Maurhart | INSY |
| 103 | Steindl | Math |
SQL Syntax: The Chain Reaction
Section titled “SQL Syntax: The Chain Reaction”When joining multiple tables, you simply append additional JOIN and ON clauses. The order usually follows the logical relationship (Student ➡️ Grade ➡️ Teacher).
SELECT s.first_name, s.last_name, g.subject, g.score, t.name AS teacher_nameFROM students sINNER JOIN grades g ON s.student_id = g.student_idINNER JOIN teachers t ON g.subject = t.subject;Resulting Table
Section titled “Resulting Table”| first_name | last_name | subject | score | teacher_name |
|---|---|---|---|---|
| Alice | Adelson | SEW | 1 | Gabriel |
| Alice | Adelson | INSY | 2 | Maurhart |
| Bob | Burger | Math | 1 | Steindl |
Key Observations:
Section titled “Key Observations:”- Logical Flow: The query first connects
studentstogradesviastudent_id. Once that connection is established, it uses thesubjectcolumn from thegradestable to pull in the matchingteacherfrom theteacherstable. - The “Filter” Effect: Since we used
INNER JOINfor both connections, any record missing a link at any point in the chain disappears. Notice that Bob’s “MEDT” grade is gone because there is no teacher for “MEDT” in ourteacherstable. - Mixing Join Types: You can mix and match. You could use a
LEFT JOINfor the first connection (to keep all students) and anINNER JOINfor the second. However, be careful: a subsequentINNER JOINcan accidentally filter out theNULLrows produced by a previousLEFT JOIN.
Summary: Joins & Set Operations
Section titled “Summary: Joins & Set Operations”In relational databases, data is typically distributed across multiple tables. To make this data usable, it must be combined. This chapter distinguishes between two primary methods: Set Operations (based on table structure) and Joins (based on logical relationships via primary and foreign keys).
1. Set Operations
Section titled “1. Set Operations”These are used to combine tables that share an identical structure (union-compatible).
- UNION: Merges two tables and removes duplicates (e.g., a combined list of projects from Villach and Klagenfurt).
- INTERSECT: Returns only the commonalities (projects running in both cities).
- EXCEPT / MINUS: Identifies the difference (projects existing in Villach but not in Klagenfurt).
2. Key Join Types
Section titled “2. Key Join Types”Joins combine columns from different tables based on a specific condition (usually something like ON s.id = g.student_id).
- Inner Join: The default case. Only records that have a matching partner in both tables appear in the result.
- Left Join: Keeps all records from the left table, even if no corresponding data exists on the right (useful for finding students without grades).
- Right Join: The mirror image of the Left Join—it keeps all records from the right table.
- Full Outer Join: The “all-in” variant. It displays everything from both tables, filling in missing partners with
NULL. - Cross Join: Produces a Cartesian product (every row paired with every other row). Caution: This can generate massive datasets!
3. Specialized Forms
Section titled “3. Specialized Forms”- Natural Join: Automatically connects tables via columns with identical names (risky in practice due to hidden logic).
- Semi-Join & Anti-Join: These are used for filtering. A Semi-Join checks if a partner exists without displaying their data. An Anti-Join specifically finds records without a partner (e.g., students who haven’t received a single grade).
Learning Outcomes: What you should be able to do after this chapter
Section titled “Learning Outcomes: What you should be able to do after this chapter”After completing this chapter, students should be able to:
- Distinguish: Know when to use a set operation (
UNION,INTERSECT,EXCEPT) versus aJOIN. - Apply: Confidently write the correct syntax for
INNER,LEFT, andFULL JOIN. - Analyze: Predict which records will appear in the result set for each join type (specifically identifying NULL values).
- Problem Solve: Create targeted queries to find “orphaned” records (e.g., using
LEFT JOINwithIS NULLorNOT EXISTS). - Optimize: Understand why aliases (e.g.,
FROM students s) improve readability and whyNatural Joinsare avoided in professional environments.