Rajendra Gupta
Sort operator properties

Overview of the SQL Count Distinct Function

June 26, 2019 by

This article explores SQL Count Distinct operator for eliminating the duplicate rows in the result set.

A developer needs to get data from a SQL table with multiple conditions. Sometimes, we want to get all rows in a table but eliminate the available NULL values. Suppose we want to get distinct customer records that have placed an order last year.

Let’s go ahead and have a quick overview of SQL Count Function.

SQL Count Function

We use SQL Count aggregate function to get the number of rows in the output. Suppose we have a product table that holds records for all products sold by a company. We want to know the count of products sold during the last quarter. We can use SQL Count Function to return the number of rows in the specified condition.

The syntax of the SQL COUNT function:
COUNT ([ALL | DISTINCT] expression);

By default, SQL Server Count Function uses All keyword. It means that SQL Server counts all records in a table. It also includes the rows having duplicate values as well.

Let’s create a sample table and insert few records in it.

In this table, we have duplicate values and NULL values as well.

Sample data

In the following screenshot, we can note that:

  • Count (*) includes duplicate values as well as NULL values
  • Count (Col1) includes duplicate values but does not include NULL values

SQL Count function

Suppose we want to know the distinct values available in the table. We can use SQL COUNT DISTINCT to do so.

In the following output, we get only 2 rows. SQL COUNT Distinct does not eliminate duplicate and NULL values from the result set.

SQL Count Distinct function

Let’s look at another example. In this example, we have a location table that consists of two columns City and State.

Now, execute the following query to find out a count of the distinct city from the table.

It returns the count of unique city count 2 (Gurgaon and Jaipur) from our result set.

SQL Count Distinct function

If we look at the data, we have similar city name present in a different state as well. The combination of city and state is unique, and we do not want that unique combination to be eliminated from the output.

We can use SQL DISTINCT function on a combination of columns as well. It checks for the combination of values and removes if the combination is not unique.

It does not remove the duplicate city names from the output because of a unique combination of values.

Output of SELECT DISTINCT

Let’s insert one more rows in the location table.

We have 5 records in the location table. In the data, you can see we have one combination of city and state that is not unique.

sample data

Rerun the SELECT DISTINCT function, and it should return only 4 rows this time.

SQL Count Distinct function

We cannot use SQL COUNT DISTINCT function directly with the multiple columns. You get the following error message.

SQL Count Distinct function incorrect syntax error

We can use a temporary table to get records from the SQL DISTINCT function and then use count(*) to check the row counts.

We get the row count 4 in the output.

count function with a temporary table

If we use a combination of columns to get distinct values and any of the columns contain NULL values, it also becomes a unique combination for the SQL Server.

To verify this, let’s insert more records in the location table. We did not specify any state in this query.

Let’s look at the location table data.

Location table sample data

Re-run the query to get distinct rows from the location table.

In the output, we can see it does not eliminate the combination of City and State with the blank or NULL values.

the combination of City and State with the blank or NULL values.

Similarly, you can see row count 6 with SQL COUNT DISTINCT function.

SQL Count distinct does not eliminate the combination of City and State with the blank or NULL values.

Difference between SELECT COUNT, COUNT(*) and SQL COUNT distinct

COUNT

Count(*)

Count(Distinct)

It returns the total number of rows after satisfying conditions specified in the where clause.

It returns the total number of rows after satisfying conditions specified in the where clause.

It returns the distinct number of rows after satisfying conditions specified in the where clause.

It gives the counts of rows. It does not eliminate duplicate values.

It considers all rows regardless of any duplicate, NULL values.

It gives a distinct number of rows after eliminating NULL and duplicate values.

It eliminates the NULL values in the output.

It does not eliminate the NULL values in the output.

It eliminates the NULL values in the output.

Execution Plan of SQL Count distinct function

Let’s look at the Actual Execution Plan of the SQL COUNT DISTINCT function. You need to enable the Actual Execution Plan from the SSMS Menu bar as shown below.

enable Actual Execution Plan

Execute the query to get an Actual execution plan. In this execution plan, you can see top resource consuming operators:

  • Sort (Distinct Sort) – Cost 78%
  • Table Scan – Cost 22%

Actual Execution Plan of SQL Count distinct

You can hover the mouse over the sort operator, and it opens a tool-tip with the operator details.

Sort operator properties

In the properties windows, also we get more details around the sort operator including memory allocation, statistics, and the number of rows.

Sort operator properties

In a table with million records, SQL Count Distinct might cause performance issues because a distinct count operator is a costly operator in the actual execution plan.

SQL Server 2019 improves the performance of SQL COUNT DISTINCT operator using a new Approx_count_distinct function. This new function of SQL Server 2019 provides an approximate distinct count of the rows. There might be a slight difference in the SQL Count distinct and Approx_Count_distinct function output.

You can replace SQL COUNT DISTINCT with the keyword Approx_Count_distinct to use this function from SQL Server 2019.

Actual execution plan of Approx_Count_distinct

You can explore more on this function in The new SQL Server 2019 function Approx_Count_Distinct.

Conclusion

In this article, we explored the SQL COUNT Function with various examples. We also covered new SQL function Approx_Count_distinct available from SQL Server 2019. I would suggest reviewing them as per your environment. If you have any comments or questions, feel free to leave them in the comments below.

Rajendra Gupta

Rajendra Gupta

Rajendra has 8+ years of experience in database administration having a passion for database performance optimization, monitoring, and high availability and disaster recovery technologies, learning new things, new features.

While working as a Senior consultant DBA for big customers and having certified with MCSA SQL 2012, he likes to share knowledge on various blogs.
He can be reached at rajendra.gupta16@gmail.com

View all posts by Rajendra Gupta
Rajendra Gupta
169 Views