Skip to content

9. Subqueries

Switch to Zen Mode

Up to this point, you have learned how to retrieve, join, and aggregate data. However, in the real world, you often face a “chicken and egg” problem: You want to filter data based on a value that you don’t even know yet.

For example, imagine you are asked to find all players who scored more than the average. To answer this, you would normally need two steps:

  1. Run a query to find the average score (e.g., 42).
  2. Run a second query using that number: WHERE score > 42.

Subqueries allow you to combine these two steps into one elegant, dynamic instruction. Instead of hardcoding a number like “42”, you embed the first query directly into the second one. This makes your SQL code “intelligent” and adaptable to changing data.

Think of a Subquery as a query within a query - a specialized tool that allows you to solve complex, multi-layered questions in a single pass. It is the step where you move from simple data retrieval to building sophisticated analytical logic.

In SQL, the way you use a subquery depends entirely on the structure of the data it returns. We distinguish between subqueries that return a single value (Scalar) and those that return a list or an entire set (Table).

Scalar Subqueries (Returning a Single Value)

Section titled “Scalar Subqueries (Returning a Single Value)”

A Scalar Subquery returns exactly one row and one column. Because it results in a single value (like a specific date or a number), you can use it with standard comparison operators like =, >, or <.

Example: Finding students younger than the average

To find all students who are younger than the average age of all students, the database must first calculate the average birth year. Students with a higher birth year were born more recently and are therefore younger.

SELECT first_name, last_name, birth_date
FROM students
WHERE YEAR(birth_date) > (SELECT AVG(YEAR(birth_date)) FROM students);
  • The Inner Query: Calculates the average birth year across all students.
  • The Outer Query: Compares each student’s birth year against that single calculated value — students born in a later year are younger.

Table Subqueries (Returning Multiple Values)

Section titled “Table Subqueries (Returning Multiple Values)”

A Table Subquery returns a list (one column, multiple rows) or even a full virtual table. Since you are dealing with multiple values, you use set-based operators like IN, ANY, or ALL.

This is the most frequent use case for subqueries. It checks if a value from the outer query exists within the result list of the inner query.

Example: Finding students of a specific form teacher (Klassenvorstand)

Suppose you want to find all students whose form teacher is ‘Mr. Maurhart’, but the teacher’s name is only stored in the teachers table.

SELECT first_name, last_name, class
FROM students
WHERE class IN (
SELECT class_name
FROM teachers
WHERE teacher_name = 'Maurhart'
);

B) Subqueries in the FROM Clause (Derived Tables)

Section titled “B) Subqueries in the FROM Clause (Derived Tables)”

You can use a subquery to create a temporary “table” and then treat it like a normal table.

Note: In most SQL dialects, a subquery in the FROM clause must be given an alias (a temporary name).

Example: Counting students in a pre-filtered list If you first want to filter a specific group and then count them:

SELECT COUNT(*) AS junior_count
FROM (
SELECT * FROM students WHERE birth_date > '2012-01-01'
) AS young_students;

A Correlated Subquery is a special kind of subquery that references one or more columns from the outer query. This creates a live link between the inner and the outer query.

The key difference from a regular subquery is that it depends on the outer query and cannot be evaluated on its own:

Regular SubqueryCorrelated Subquery
Self-contained?Yes, independent of the outer queryNo, needs values from the outer query
ResultA fixed value or listA value that depends on the current outer row
References outer query?NoYes

Think of it like this: the result of the inner query depends on the data of the specific student the outer query is currently looking at.

Example: Finding students who scored above their own class average

Instead of comparing every student against the school-wide average, you want to find those who beat the average of their specific class.

SELECT first_name, last_name, class, grade
FROM students AS s_outer
WHERE grade > (
SELECT AVG(grade)
FROM students AS s_inner
WHERE s_inner.class = s_outer.class -- link to the current outer row
);
  • The link: s_inner.class = s_outer.class — the inner query “looks back” at the row currently being examined by the outer query.
  • What this means: For a student in “3AHIF” the inner query gives the average only for 3AHIF; for a student in “3BHIF” it gives the average for “3BHIF”, and so on.

The EXISTS operator is commonly paired with correlated subqueries. It does not return data from the inner query — it only checks whether the inner query produces any rows at all. If it does, the result is TRUE.

Example: Find all teachers who have at least one student in their class

SELECT teacher_name, class_name
FROM teachers AS t
WHERE EXISTS (
SELECT 1
FROM students AS s
WHERE s.class = t.class_name -- link: any student in this teacher's class?
);
  • SELECT 1 is a common shorthand — the actual value returned does not matter, only whether rows exist.
  • This is often more readable than a long IN (...) list when checking for the existence of a relationship.

Performance note: On very large tables, correlated subqueries can be slower than other formulations. For typical school-sized and mid-sized datasets they are a perfectly valid and clear solution.


FeatureScalar SubqueryTable SubqueryCorrelated Subquery
Result1 Row, 1 ColumnMultiple Rows (List or Table)Depends on the current outer row
Common Operators=, >, <, <>IN, NOT IN, FROMEXISTS, =, >
References outer query?NoNoYes
School Example”… above the school average.""… in a list of specific classes.""… above their own class average.”

In SQL, a Subquery is a “query within a query.” It allows you to solve complex questions in a single step by using the result of one query to help the main (outer) query filter or process data. This is especially useful when you need to filter data based on a value you don’t know yet—like an average or a specific list.

Subqueries make your SQL code dynamic. Instead of manually looking up a value and then typing it into a second query, the subquery calculates that value “on the fly.”

  • Inner Query: Provides a result (such as a value or a list).
  • Outer Query: Uses that result to produce the final data set.

How you use a subquery depends on what it returns and whether it references the outer query:

  • Scalar Subqueries (Single Value):
    • Returns exactly one value (e.g., the average grade).
    • Used with standard operators like =, >, or <.
    • Example: Finding all students whose score is higher than the school-wide average.
  • Table Subqueries (Lists or Tables):
    • Returns a list of values or a temporary table.
    • Used with set operators like IN or in the FROM clause.
    • Example: Finding all students who belong to a list of classes managed by a specific teacher.
  • Correlated Subqueries:
    • References a column from the outer query, so its result depends on the current outer row.
    • Used with EXISTS, or comparison operators when the compared value depends on the current row.
    • Example: Finding students who scored above the average of their own class.
FeatureScalar SubqueryTable SubqueryCorrelated Subquery
Result1 Row, 1 ColumnMultiple RowsDepends on the current outer row
Operators=, >, <, <>IN, EXISTS, FROMEXISTS, =, >
References outer query?NoNoYes
GoalCompare to a single metricFilter against a list or groupCompare each row to its own group metric

You can also put a subquery in the FROM clause. This creates a temporary “virtual table” that you can then query like a normal table. In most SQL systems, you must give this temporary table a nickname (an alias) to make it work.


Learning Outcomes: What you should be able to do

Section titled “Learning Outcomes: What you should be able to do”
  • Identify: Recognize when a problem requires two steps (e.g., “Find the average, then filter”) and solve it with one subquery.
  • Compare: Use scalar subqueries to filter data against single values like MAX() or AVG().
  • Filter with Lists: Use the IN operator to connect tables without using a standard JOIN when only a filter is needed.
  • Structure: Correctly place subqueries in the WHERE or FROM clauses depending on the goal.
  • Correlate: Write a correlated subquery that links the inner query to the current outer row (e.g., “above their own class average”) and understand that its result depends on the current outer row.
  • Existence Check: Use EXISTS with a correlated subquery to efficiently test whether a related row is present in another table.