9. Subqueries
Subqueries in SQL
Section titled “Subqueries in SQL”Subqueries - The Power of Nested Logic
Section titled “Subqueries - The Power of Nested Logic”Up to this point, you have learned how to retrieve, join, and aggregate data. However, in the real world, you often face a “chicken and egg” problem: You want to filter data based on a value that you don’t even know yet.
For example, imagine you are asked to find all players who scored more than the average. To answer this, you would normally need two steps:
- Run a query to find the average score (e.g., 42).
- Run a second query using that number:
WHERE score > 42.
Subqueries allow you to combine these two steps into one elegant, dynamic instruction. Instead of hardcoding a number like “42”, you embed the first query directly into the second one. This makes your SQL code “intelligent” and adaptable to changing data.
Think of a Subquery as a query within a query - a specialized tool that allows you to solve complex, multi-layered questions in a single pass. It is the step where you move from simple data retrieval to building sophisticated analytical logic.
Types of Subqueries
Section titled “Types of Subqueries”In SQL, the way you use a subquery depends entirely on the structure of the data it returns. We distinguish between subqueries that return a single value (Scalar) and those that return a list or an entire set (Table).
Scalar Subqueries (Returning a Single Value)
Section titled “Scalar Subqueries (Returning a Single Value)”A Scalar Subquery returns exactly one row and one column. Because it results in a single value (like a specific date or a number), you can use it with standard comparison operators like =, >, or <.
Example: Finding students younger than the average
To find all students who are younger than the average age of all students, the database must first calculate the average birth year. Students with a higher birth year were born more recently and are therefore younger.
SELECT first_name, last_name, birth_dateFROM studentsWHERE YEAR(birth_date) > (SELECT AVG(YEAR(birth_date)) FROM students);- The Inner Query: Calculates the average birth year across all students.
- The Outer Query: Compares each student’s birth year against that single calculated value — students born in a later year are younger.
Table Subqueries (Returning Multiple Values)
Section titled “Table Subqueries (Returning Multiple Values)”A Table Subquery returns a list (one column, multiple rows) or even a full virtual table. Since you are dealing with multiple values, you use set-based operators like IN, ANY, or ALL.
A) Using the IN Operator
Section titled “A) Using the IN Operator”This is the most frequent use case for subqueries. It checks if a value from the outer query exists within the result list of the inner query.
Example: Finding students of a specific form teacher (Klassenvorstand)
Suppose you want to find all students whose form teacher is ‘Mr. Maurhart’, but the teacher’s name is only stored in the teachers table.
SELECT first_name, last_name, classFROM studentsWHERE class IN ( SELECT class_name FROM teachers WHERE teacher_name = 'Maurhart');B) Subqueries in the FROM Clause (Derived Tables)
Section titled “B) Subqueries in the FROM Clause (Derived Tables)”You can use a subquery to create a temporary “table” and then treat it like a normal table.
Note: In most SQL dialects, a subquery in the
FROMclause must be given an alias (a temporary name).
Example: Counting students in a pre-filtered list If you first want to filter a specific group and then count them:
SELECT COUNT(*) AS junior_countFROM ( SELECT * FROM students WHERE birth_date > '2012-01-01') AS young_students;Correlated Subqueries
Section titled “Correlated Subqueries”A Correlated Subquery is a special kind of subquery that references one or more columns from the outer query. This creates a live link between the inner and the outer query.
The key difference from a regular subquery is that it depends on the outer query and cannot be evaluated on its own:
| Regular Subquery | Correlated Subquery | |
|---|---|---|
| Self-contained? | Yes, independent of the outer query | No, needs values from the outer query |
| Result | A fixed value or list | A value that depends on the current outer row |
| References outer query? | No | Yes |
Think of it like this: the result of the inner query depends on the data of the specific student the outer query is currently looking at.
Example: Finding students who scored above their own class average
Instead of comparing every student against the school-wide average, you want to find those who beat the average of their specific class.
SELECT first_name, last_name, class, gradeFROM students AS s_outerWHERE grade > ( SELECT AVG(grade) FROM students AS s_inner WHERE s_inner.class = s_outer.class -- link to the current outer row);- The link:
s_inner.class = s_outer.class— the inner query “looks back” at the row currently being examined by the outer query. - What this means: For a student in “3AHIF” the inner query gives the average only for 3AHIF; for a student in “3BHIF” it gives the average for “3BHIF”, and so on.
Using EXISTS with Correlated Subqueries
Section titled “Using EXISTS with Correlated Subqueries”The EXISTS operator is commonly paired with correlated subqueries. It does not return data from the inner query — it only checks whether the inner query produces any rows at all. If it does, the result is TRUE.
Example: Find all teachers who have at least one student in their class
SELECT teacher_name, class_nameFROM teachers AS tWHERE EXISTS ( SELECT 1 FROM students AS s WHERE s.class = t.class_name -- link: any student in this teacher's class?);SELECT 1is a common shorthand — the actual value returned does not matter, only whether rows exist.- This is often more readable than a long
IN (...)list when checking for the existence of a relationship.
Performance note: On very large tables, correlated subqueries can be slower than other formulations. For typical school-sized and mid-sized datasets they are a perfectly valid and clear solution.
Comparison: When to use which?
Section titled “Comparison: When to use which?”| Feature | Scalar Subquery | Table Subquery | Correlated Subquery |
|---|---|---|---|
| Result | 1 Row, 1 Column | Multiple Rows (List or Table) | Depends on the current outer row |
| Common Operators | =, >, <, <> | IN, NOT IN, FROM | EXISTS, =, > |
| References outer query? | No | No | Yes |
| School Example | ”… above the school average." | "… in a list of specific classes." | "… above their own class average.” |
Summary: Subqueries
Section titled “Summary: Subqueries”In SQL, a Subquery is a “query within a query.” It allows you to solve complex questions in a single step by using the result of one query to help the main (outer) query filter or process data. This is especially useful when you need to filter data based on a value you don’t know yet—like an average or a specific list.
1. The Core Concept
Section titled “1. The Core Concept”Subqueries make your SQL code dynamic. Instead of manually looking up a value and then typing it into a second query, the subquery calculates that value “on the fly.”
- Inner Query: Provides a result (such as a value or a list).
- Outer Query: Uses that result to produce the final data set.
2. Types of Subqueries
Section titled “2. Types of Subqueries”How you use a subquery depends on what it returns and whether it references the outer query:
- Scalar Subqueries (Single Value):
- Returns exactly one value (e.g., the average grade).
- Used with standard operators like
=,>, or<. - Example: Finding all students whose score is higher than the school-wide average.
- Table Subqueries (Lists or Tables):
- Returns a list of values or a temporary table.
- Used with set operators like
INor in theFROMclause. - Example: Finding all students who belong to a list of classes managed by a specific teacher.
- Correlated Subqueries:
- References a column from the outer query, so its result depends on the current outer row.
- Used with
EXISTS, or comparison operators when the compared value depends on the current row. - Example: Finding students who scored above the average of their own class.
3. Quick Comparison
Section titled “3. Quick Comparison”| Feature | Scalar Subquery | Table Subquery | Correlated Subquery |
|---|---|---|---|
| Result | 1 Row, 1 Column | Multiple Rows | Depends on the current outer row |
| Operators | =, >, <, <> | IN, EXISTS, FROM | EXISTS, =, > |
| References outer query? | No | No | Yes |
| Goal | Compare to a single metric | Filter against a list or group | Compare each row to its own group metric |
4. Derived Tables
Section titled “4. Derived Tables”You can also put a subquery in the FROM clause. This creates a temporary “virtual table” that you can then query like a normal table. In most SQL systems, you must give this temporary table a nickname (an alias) to make it work.
Learning Outcomes: What you should be able to do
Section titled “Learning Outcomes: What you should be able to do”- Identify: Recognize when a problem requires two steps (e.g., “Find the average, then filter”) and solve it with one subquery.
- Compare: Use scalar subqueries to filter data against single values like
MAX()orAVG(). - Filter with Lists: Use the
INoperator to connect tables without using a standard JOIN when only a filter is needed. - Structure: Correctly place subqueries in the
WHEREorFROMclauses depending on the goal. - Correlate: Write a correlated subquery that links the inner query to the current outer row (e.g., “above their own class average”) and understand that its result depends on the current outer row.
- Existence Check: Use
EXISTSwith a correlated subquery to efficiently test whether a related row is present in another table.