Haroon Ashraf
Subquery example of Student table and ExamType table before we talk about correlated subquery example

Why do we need Correlated Subqueries in SQL

December 9, 2022 by

This article speaks about correlated subquery and why we need it when querying a SQL database.

Additionally, we are going to understand the difference between simple subqueries and correlated subqueries with the help of some easily understandable examples.

This article is equally suitable for beginners and professionals and offers an opportunity to think outside the box when analyzing the data and finding crucial answers keeping in mind that the way we analyze the data matters a lot.

Getting familiar with Subquery and Correlated Subquery

Let’s get started by first getting familiar with the simple subquery before we understand its special form and why we need it.

Interestingly, it takes us a very small amount of time to understand the importance of writing a query to find out your desired data-related answers from a database (such as an SQL database) but when it comes to the subquery, as a beginner, I know, it is not always easy to think of a scenario straight away that you can say this is the time to use a subquery.

However, the situation gets slightly more intricate (complex) if you try to think of solving a data problem by suggesting that this is the time for correlated subquery unless you have that particular knowledge of when and how to use it and this is what (the knowledge-based understanding) we are trying to equip you with in this article.

It is absolutely fine to revisit the definition of subquery perhaps in a different fashion to keep the interest before we define its counterpart.

What is Subquery?

A subquery is a query within a query. In simple words, it is using a query inside a query for specific data needs.

Microsoft Documentation

According to Microsoft documentation, a subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery.

Another way to understand Subquery

Before we talk about correlated subquery, a simple example of the subquery is when you are interested to get important information from another table independently in a query that is already getting information for a table.

I hope I have not confused you, so let us rephrase this by saying a subquery is a type of query that contains data for a table and then can be used as a column (expression) side by side with other natural columns of another query that is retrieving data for another table.

You are going to get a better understanding once we start explaining this with examples.

Subquery Example

Suppose, there is a table called Student that contains information about the number of enrolled students in an educational institution and another table called ExamType which contains the total number of exam types available for all the students.

Now you are interested to view all the students of the institution but would like to see the number of total types of exams each of them has to go through. This is a genuine case of using a subquery because both parts of information lie in two different tables. So, you have to merge the query that shows a total number of exam types inside the same query that retrieves the total students.

Subquery example of Student table and ExamType table before we talk about correlated subquery example

The good news is that you can also test run this example by building the tables first in any sample SQL database of your choice as long as it is not your Production database.

Hold on, one might ask, why are we taking so much time to explain subquery while this topic is related to correlated subquery?

The reason is if you understand subquery, you can then enhance your understanding to build correlated subquery and, in this way, we are going to not only explain both but also be able to differentiate them easily.

Let us build the required tables with some data:

As discussed before, to build a subquery, we can build two independent queries first, so, let us write the script for the one that contains the total number of Exam Types:

The output is as follows:

Total Exam Type are 3

Now build the query to view the students’ data along with a total number of exam types by wrapping the above (ExamType) query as a column as follows:

The results are shown below:

The total exam types are three for each of the three students (Atif, Sam and Sarah).

We have successfully implemented a subquery to find out the desired data. The output shows that each enrolled student has to have three standard exams.

Inner Query Outer Query

Please remember that a subquery is also known as an inner query while the query it gets attached with is called an outer query.

In our example SELECT COUNT(*) FROM dbo.ExamType is the subquery (inner query) and the main query SELECT s.StudentId,s.Name,(SELECT COUNT(*) FROM dbo.ExamType) AS Total_ExamTypes FROM dbo.Student s where it is used as a column in the outer query.

Independence from the outer query (Subquery)

Here the most important point about this subquery is that it can be successfully run independently as we have just seen.

In other words, it is independent of the outer query. So, we can build it separately and then bind it later. Please keep this point in mind as this is going to make a big difference as you see the next topic.

What is Correlated Subquery?

A correlated subquery is a subquery that depends on the outer query and is evaluated for each instance of the outer query. One thing is clear it depends on the outer query unlike subquery, but we will explain this further in the upcoming sections of this article.

I am purposely skipping the example here for a reason as the next topic is more suitable to understand it with example and also to understand why we need it.

Why We Need It

In order to understand correlated subquery and why we need it we have to go back to subquery but this time with a different example because that example is going to pave the path for the one, we are interested in, so please be attentive.

We need to add a few more things to our database.

Adding Subject Reference Table and Student Subject Table

Let us now add the following two new tables with data:

  1. Subject Table (A reference table called Subject that stores the available subjects to the students who can choose as per their preference)
  2. StudentSubject Table (the table that maintains the information about the chosen subject of a student and the marks obtained in that subject along with the total marks)

Please write the following script to achieve the objective of adding the above-mentioned tables:

View the newly built Subject and Student Subject Tables

Let us view the tables by running the following T-SQL script against our sample database:

The results are shown below:

Subject table contains three subjects: Data Structure, Modern Databases and Business Intelligence while StudentSubject table contains student, subject, marks obtained and total marks

Finding Student Obtaining Marks More than Average Marks (Subquery)

Now if we were to find the star student who has obtained marks more than the average marks then the first thing, we will do is to build a query (we are talking about subquery example) to get average marks from the student subject table:

Next, run it to test it:

Average marks are 80

We have built a query that can be now used as a subquery in an outer query as follows:

The output is as follows:

Finding the student whose marks are greater than average marks

Finding Student Obtaining the Maximum Marks (Subquery)

Now just one more example of subquery where we are interested to find out the student with maximum marks.

Again, although arguable, but to make it simple, we need to break this problem into two steps:

  1. Finding the maximum marks
  2. Finding the student with maximum marks

Let us find the maximum marks by building an independent small query:

Now we use this as a subquery in the main query to get the student who has got the maximum marks:

The output is as follows:

Finding the student who has maximum marks (Sarah with 90 marks)

The Need for Correlated Subquery (Finding Student with Maximum Marks in each Subject)

We straight away need a correlated subquery in the above case if we need to find all the students who obtained maximum marks in each subject (category).

Now subject depends on the outer query (as the outer query’s subjectId is going to be referenced in the inner query) so we cannot build the inner query independently and that is the major difference between the subquery and its counterpart.

To build it we are going to reference the subject id in the inner query as follows:

The results are shown below:

Use of correlated subquery to view the top students in each subject category (Sarah in Data Structure, Atif in Moderna Databases, Sarah in Business Intelligence)

The above screenshot clearly finds out the top students (whose obtained marks are more than anyone) in each subject (category).

You must have spotted the student Atif with even 70 marks in the list and the reason he is there in that list is that if you refer to the StudentSubject table you will soon find out that Atif (out of the three students) is the only student who chose Modern Databases as subject and as a result of that he becomes the top student in Modern Databases.

So, when we want to get a particular aggregate (SUM, AVG, MAX, MIN) in each of the reference table values such as to find out the maximum or minimum in each of the categories we require this special form of subquery we were keen to build right from the beginning.

Now, we are in a very good situation to differentiate between subquery and correlated subquery.

Subquery vs Correlated Subquery

Considering the above example, it is slightly easy to differentiate between subquery and correlated subquery in the light of the following points:

  1. Subquery is independent of outer query but its correlated form depends on the outer query
  2. Subqueries can be built separately (especially in the case of SELECT subquery) to be attached later with an outer query while the correlated version cannot be built separately outside the main query
  3. Subqueries are not executed repeatedly once for each row (especially in the case of the WHERE clause) while correlated subqueries are executed repeatedly once for each row
  4. According to Microsoft documentation, the subquery is also called inner query while correlated subquery can also be termed as repeating subquery

Tip

Although correlated subqueries help you to find detailed information, but it comes with a cost and may soon get resource intensive particularly with tables having a lot of data, so you have to bear this in mind when considering them as part of your solution.

You can also use table-valued functions in the correlated subquery.

Congratulations! You have just learned to use correlated subquery and you now know the reason you require it in your database solution.

Try for yourself

Keeping in mind the examples in this article please test yourself to see if you are able to write the following queries:

  1. Find out the student who obtained minimum marks
  2. Look for the student who obtained minimum marks in each subject category
  3. Please search for the student who got more marks than average marks
  4. Please try writing a script to find the student who got more marks than average marks in each subject category
  5. Finally, please try to locate the student who got fewer marks than average marks in each subject category
Haroon Ashraf
Latest posts by Haroon Ashraf (see all)
SQL commands

About Haroon Ashraf

Haroon’s deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional. He holds BSc and MSc Degrees in Computer Science and also received the OPF merit award. He began his professional life as a computer programmer more than a decade ago, working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data. His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM). He has also received passing grade to earn DevOps for Databases verified certificate, an area in which he finds particular interest and potential. View all posts by Haroon Ashraf

168 Views