Rajendra Gupta
SQL Percentile function

Calculate SQL Percentile using the PERCENT_RANK function in SQL Server

August 15, 2019 by

This article explores the SQL Server PERCENT_RANK analytical function to calculate SQL Percentile and its usage with various examples.

Overview of PERCENT_RANK() function

In a relational database, we store the data in SQL tables. Many times, we fetch the data without any data manipulations. Let’s say you want to generate the SSRS report that contains the top 5% sales for a product in a region. In this case, you need to use SQL Server Analytical functions to perform the data manipulations such as SQL Percentile.

We have the following analytical functions in SQL Server.

  • CUME_DIST()
  • FIRST_VALUE()
  • LAST_VALUE()
  • LEAD()
  • LAG()
  • PERCENTILE_COUNT()
  • PERCENTILE_DISC()
  • PERCENT_RANK()

You can also direct to SQL Rank functions to understand how you can specify rank to each row of your result set over a partition using these window functions.

PERCENT_RANK()

The PERCENT_RANK function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values. This function is nondeterministic.

The syntax of the PERCENT_RANK() function is as below:

In this syntax, we use the following parameters.

  • PARTITION BY: By default, SQL Server treats the whole data set as a single set. We can specify the PARTITION By clause to divide data into multiple sets. On each set, the Percent_Rank function performs the analytical calculations. It is an optional parameter
  • Order By: We use Order by clause to sort the data in either ascending or descending order. It is a compulsory parameter

Let’s look at the examples to understand it.

Example 1: PERCENT_RANK function for calculating the SQL Percentile of student marks

In this example, let’s use a CTP ( common table expression) to hold the student marks, and we use PERCENT_RANK function from this data for marks in ascending order.

In the output, we can see following outcomes

  • A lowest mark 49 gets Percent_Rank value zero
  • The highest marks 93 gets Percent_Rank value one

SQL Percentile function

Example 2: PERCENT_RANK function with to calculate SQL Percentile having NULL values

Let’s add a NULL value in the student marks and rerun the query from example 1.

As shown in the following screenshot, you always get zero for the NULL values.

PERCENT_RANK function with NULL values

Example 3: PERCENT_RANK function to calculate SQL Percentile having duplicate values

Suppose multiple students are having similar marks, and we want to calculate the percent rank. It is important to know that the function will return the same or different rank for a student with similar marks.

We have two students with marks 81. In the following example, we can see that percent_rank function returns the same value.

PERCENT_RANK function with Duplicate values

Here is a catch, Let’s say two students have got the highest marks 93 and we want to calculate the percent_rank for these.

In this example, we have the highest marks 93, and two students have the same marks. Previously, we get percent rank 1 for the highest rank. Now, we do not get the rank 1, and it calculates the percent rank according to the similar highest marks.

PERCENT_RANK function with Duplicate values

Example 4: PERCENT_RANK function to calculate SQL Percentile with marks in descending order

In previous examples, we did not specify ascending or descending order for the marks. By default, it sorts the data in ascending order. We can specify the DESC clause to sort the data in descending order.

In the descending order, we can see that the highest marks get 0 ranks and the lowest marks get rank 1.

PERCENT_RANK function to calculate SQL Percentile with marks in descending order

In the descending order, for the student with similar highest marks always get zero rank.


PERCENT_RANK function to calculate SQL Percentile with marks in descending order

Similar to example 3, two students with the lowest marks do not get percent_rank 1.

PERCENT_RANK function with duplicate lowest and highest values

Example 4: PERCENT_RANK function for SQL Percentile with PARTITION BY clause

In the previous example, we did not use the PARTITION by clause to divide the data into multiple small data sets.

Suppose we want to check the Percent_Rank of an employee pay rate per the department name from the AdventureWorks sample database. In the following query, we use PARTITION BY Department, ORDER BY Rate to define partition on the department as per the rates in the ascending order.

In the following image, we have two departments group, Engineering and Finance.

  • For the Engineering group, Terri has the highest pay rate, and its percent_rank is 1
  • Rob has the lowest pay rate, and it gets percent_rank zero in the engineering group
  • In another group, Finance, Laura gets percent_rank 1 and David gets percent_rank zero according to their pay rates

PERCENT_RANK function with  PARTITION BY clause

If we do not want to have partitions, remove the PARTITION BY clause and it treats the input as single data set.


PERCENT_RANK function without PARTITION BY clause

Conclusion

In this article, we learned the SQL PERCENT_RANK() function for calculating SQL Percentile. It calculates the relative rank of a row within a group or subset of data. Here is a quick summary of what we learned about SQL Server PERCENT_RANK() function in this article:

  • PERCENT_RANK calculate the relative rank of a row within a group of rows
  • We use the sort data in ascending or descending order using order by clause
  • We can partition data using the PARTITION BY clause, and it divides data into specific small data sets
  • A NULL value also gets percent_rank value zero in the output

You should learn this SQL Server Analytical function and be familiar with this.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views