This article explores the SQL Server PERCENT_RANK analytical function to calculate SQL Percentile and its usage with various examples.
Overview of PERCENT_RANK() function
In a relational database, we store the data in SQL tables. Many times, we fetch the data without any data manipulations. Let’s say you want to generate the SSRS report that contains the top 5% sales for a product in a region. In this case, you need to use SQL Server Analytical functions to perform the data manipulations such as SQL Percentile.
We have the following analytical functions in SQL Server.
- CUME_DIST()
- FIRST_VALUE()
- LAST_VALUE()
- LEAD()
- LAG()
- PERCENTILE_COUNT()
- PERCENTILE_DISC()
- PERCENT_RANK()
You can also direct to SQL Rank functions to understand how you can specify rank to each row of your result set over a partition using these window functions.
PERCENT_RANK()
The PERCENT_RANK function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values. This function is nondeterministic.
The syntax of the PERCENT_RANK() function is as below:
1 2 3 4 |
PERCENT_RANK() OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... ) |
In this syntax, we use the following parameters.
- PARTITION BY: By default, SQL Server treats the whole data set as a single set. We can specify the PARTITION By clause to divide data into multiple sets. On each set, the Percent_Rank function performs the analytical calculations. It is an optional parameter
- Order By: We use Order by clause to sort the data in either ascending or descending order. It is a compulsory parameter
Let’s look at the examples to understand it.
Example 1: PERCENT_RANK function for calculating the SQL Percentile of student marks
In this example, let’s use a CTP ( common table expression) to hold the student marks, and we use PERCENT_RANK function from this data for marks in ascending order.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (60), (67), (49), (81), (93)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
In the output, we can see following outcomes
- A lowest mark 49 gets Percent_Rank value zero
- The highest marks 93 gets Percent_Rank value one
Example 2: PERCENT_RANK function with to calculate SQL Percentile having NULL values
Let’s add a NULL value in the student marks and rerun the query from example 1.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (60), (67), (49), (81), (93),(NULL)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
As shown in the following screenshot, you always get zero for the NULL values.
Example 3: PERCENT_RANK function to calculate SQL Percentile having duplicate values
Suppose multiple students are having similar marks, and we want to calculate the percent rank. It is important to know that the function will return the same or different rank for a student with similar marks.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (60), (67), (49), (81),(81), (93)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
We have two students with marks 81. In the following example, we can see that percent_rank function returns the same value.
Here is a catch, Let’s say two students have got the highest marks 93 and we want to calculate the percent_rank for these.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (60), (67), (49), (81),(93), (93)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
In this example, we have the highest marks 93, and two students have the same marks. Previously, we get percent rank 1 for the highest rank. Now, we do not get the rank 1, and it calculates the percent rank according to the similar highest marks.
Example 4: PERCENT_RANK function to calculate SQL Percentile with marks in descending order
In previous examples, we did not specify ascending or descending order for the marks. By default, it sorts the data in ascending order. We can specify the DESC clause to sort the data in descending order.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (60), (67), (49), (81),(93)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks desc) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
In the descending order, we can see that the highest marks get 0 ranks and the lowest marks get rank 1.
In the descending order, for the student with similar highest marks always get zero rank.
1 2 3 4 5 6 7 8 9 10 |
WITH Student AS (SELECT Marks FROM(VALUES(75), (59), (69), (88), (72), (86), (93),(60), (67), (49), (81),(93)) Val(Marks)), Percentiles AS (SELECT Marks, PERCENT_RANK() OVER( ORDER BY Marks desc) AS Percent_Rank FROM Student) SELECT * FROM Percentiles; |
Similar to example 3, two students with the lowest marks do not get percent_rank 1.
Example 4: PERCENT_RANK function for SQL Percentile with PARTITION BY clause
In the previous example, we did not use the PARTITION by clause to divide the data into multiple small data sets.
Suppose we want to check the Percent_Rank of an employee pay rate per the department name from the AdventureWorks sample database. In the following query, we use PARTITION BY Department, ORDER BY Rate to define partition on the department as per the rates in the ascending order.
1 2 3 4 5 6 7 8 9 10 |
SELECT Department, Firstname, Rate, PERCENT_RANK() OVER(PARTITION BY Department ORDER BY Rate) AS PctRank FROM HumanResources.vEmployeeDepartmentHistory AS edh INNER JOIN HumanResources.EmployeePayHistory AS e ON e.BusinessEntityID = edh.BusinessEntityID WHERE Department IN(N'Engineering', N'Finance') ORDER BY Department, Rate DESC; |
In the following image, we have two departments group, Engineering and Finance.
- For the Engineering group, Terri has the highest pay rate, and its percent_rank is 1
- Rob has the lowest pay rate, and it gets percent_rank zero in the engineering group
- In another group, Finance, Laura gets percent_rank 1 and David gets percent_rank zero according to their pay rates
If we do not want to have partitions, remove the PARTITION BY clause and it treats the input as single data set.
1 2 3 4 5 6 7 8 9 10 |
SELECT Department, Firstname, Rate, PERCENT_RANK() OVER( ORDER BY Rate) AS PctRank FROM HumanResources.vEmployeeDepartmentHistory AS edh INNER JOIN HumanResources.EmployeePayHistory AS e ON e.BusinessEntityID = edh.BusinessEntityID WHERE Department IN(N'Engineering', N'Finance') ORDER BY Department, Rate DESC; |
Conclusion
In this article, we learned the SQL PERCENT_RANK() function for calculating SQL Percentile. It calculates the relative rank of a row within a group or subset of data. Here is a quick summary of what we learned about SQL Server PERCENT_RANK() function in this article:
- PERCENT_RANK calculate the relative rank of a row within a group of rows
- We use the sort data in ascending or descending order using order by clause
- We can partition data using the PARTITION BY clause, and it divides data into specific small data sets
- A NULL value also gets percent_rank value zero in the output
You should learn this SQL Server Analytical function and be familiar with this.
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023
- Learn the PostgreSQL COALESCE command - January 19, 2023