Emil Drkusic
aggregate functions - the data model we'll use

Learn SQL: Aggregate Functions

January 28, 2020 by

SQL has many cool features and aggregate functions are definitely one of these features, actually functions. While they are not specific to SQL, they are used often. They are part of the SELECT statement, and this allows us to have all benefits of SELECT (joining tables, filtering only rows and columns we need), combined with the power of these functions.

The Model

Before we start talking about aggregate functions, we’ll shortly comment on the data model we’ll be using.

aggregate functions - the data model we'll use

This is the same model we’ve been using in a few past articles. I won’t go into details, but rather mention that all 6 tables in the model contain data. Some of the records in tables are referenced in others, while some are not. E.g. we have countries without any related city, and we have cities without any related customers. We’ll comment on this in the article where it will be important.

The Simplest Aggregate Function

We’ll, of course, start with the simplest possible aggregate function. But, before we do it, let’s check the contents of the two tables we’ll use throughout this article. There are tables country and city. We’ll use the following statements:

You can see the result in the picture below:

check contents of the table

This is nothing new and unexpected. We’ve just listed everything that is in our tables ( “*” in the query will result in returning all columns/attributes, while the lack of any condition/WHERE part of the query will result in returning all rows).

The only thing I would like to point out is that the country table has 7 rows and that the city table has 6 rows. Now, let’s examine the following queries and their result:

COUNT number of rows from the tables

We can notice that for each query we got one row as a result, and the number returned represents the number of rows in each of these two tables. That’s what aggregate function COUNT does. It takes what the query without COUNT would return, and then returns the number of rows in that result. One more important thing you should be aware of is that only COUNT can be used with “*”. All other functions shall require an attribute (or formula) between brackets. We’ll see that later.

Aggregate Functions & JOINs

Now let’s try two more things. First, we’ll test how COUNT works when we’re joining tables. To do that, we’ll use the following queries:

check the contents of the table and COUNT rows

While the first query is not needed, I’ve used it to show what it will return. I did that because this is what the second query counts. When two tables are joined, you can think of that result as of some intermediate table that can be used as any other tables (e.g. for calculations using aggregate functions, in subqueries).

  • Tip: Whenever you’re writing a complex query, you can check what would parts return and that way you’ll be sure your query is working and will be working, as expected.

Also, we should notice, one more thing. We’ve used INNER JOIN while joining tables country and city. This will eliminate countries without any cities from the result (you can check why here). Now we’ll run 3 more queries where tables are joined using LEFT JOIN:

testing the contents of tables and performing simple COUNTs

We can notice a few things:

  • 1st query returned 8 rows. These are the same 6 rows as in a query using INNER JOIN and 2 more rows for countries that don’t have any related city (Russia & Spain)
  • 2nd query counts the number of rows 1st query returns, so this number is 8
  • 3rd query has two important things to comment on. The first one is that we’ve used aggregate function (COUNT), twice in the SELECT part of the query. This will usually be the case because you’re interested in more details about the group you want to analyze (number of records, average values, etc.). The second important thing is that these 2 counts used column names instead of “*” and they returned different values. That happens because COUNT was created that way. If you put column names between brackets COUNT will count how many values are there (not including NULL values). All our records had value for country_name, so the 1st COUNT returned 8. On the other hand, city_name wasn’t defined 2 times (=NULL), so the 2nd COUNT returned 6 (8-2=6)
  • Note: This stands for other aggregate functions as well. If they run into NULL values, they will simply ignore them and calculate as they don’t exist.

SQL Aggregate Functions

Now it’s time that we mention all T-SQL aggregate functions. The most commonly used are:

  • COUNT – counts the number of elements in the group defined
  • SUM – calculates the sum of the given attribute/expression in the group defined
  • AVG – calculates the average value of the given attribute/expression in the group defined
  • MIN – finds the minimum in the group defined
  • MAX – finds the maximum in the group defined

These 5 are most commonly used and they are standardized so you’ll need them not only in SQL Server but also in other DBMSs. The remaining aggregate functions are:

  • APPROX_COUNT_DISTINCT
  • CHECKSUM_AGG
  • COUNT_BIG
  • GROUPING
  • GROUPING_ID
  • STDEV
  • STDEVP
  • STRING_AGG
  • VAR
  • VARPB

While all aggregate functions could be used without the GROUP BY clause, the whole point is to use the GROUP BY clause. That clause serves as the place where you’ll define the condition on how to create a group. When the group is created, you’ll calculate aggregated values.

  • Example: Imagine that you have a list of professional athletes and you know which sport each one of them plays. You could ask yourself something like – From my list, return the minimal, maximal and average height of players, grouped by the sport they play. The result would be, of course, MIN, MAX, and AVG height for groups – “football players”, “basketball players”, etc.

Aggregate Functions – Examples

Now, let’s take a look at how these functions work on a single table. They are rarely used this way, but it’s good to see it, at least for educational purposes:

count values on the single table

The query returned aggregated value for all cities. While these values don’t have any practical use, this shows the power of aggregate functions.

Now we’ll do something smarter. We’ll use these functions in a way much closer than what you could expect in real-life situations:

aggregate functions examples using inner join

This is a much “smarter” query than the previous one. It returned the list of all countries, with a number of cities in them, as well as SUM, AVG, MIN, and MAX of their lat values.

Please notice that we’ve used the GROUP BY clause. By placing country.id and country. country_name, we’ve defined a group. All cities belonging to the same country will be in the same group. After the group is created, aggregated values are calculated.

  • Note: The GROUP BY clause must contain all attributes that are outside aggregate functions (in our case that was country.country_name). You could also include other attributes. We’ve included country.id because we’re sure it uniquely defines each country.

Conclusion

Aggregate functions are a very powerful tool in databases. They serve the same purpose as their equivalents in MS Excel, but the magic is that you can query data and apply functions in the same statement. Today, we’ve seen basic examples. Later in this series, we’ll use them to solve more complicated problems (with more complicated queries), so stay tuned.

Table of contents

Learn SQL: CREATE DATABASE & CREATE TABLE Operations
Learn SQL: INSERT INTO TABLE
Learn SQL: Primary Key
Learn SQL: Foreign Key
Learn SQL: SELECT statement
Learn SQL: INNER JOIN vs LEFT JOIN
Learn SQL: SQL Scripts
Learn SQL: Types of relations
Learn SQL: Join multiple tables
Learn SQL: Aggregate Functions
Learn SQL: How to Write a Complex SELECT Query
Learn SQL: The INFORMATION_SCHEMA Database
Learn SQL: SQL Data Types
Learn SQL: Set Theory
Learn SQL: User-Defined Functions
Learn SQL: User-Defined Stored Procedures
Learn SQL: SQL Views
Learn SQL: SQL Triggers
Learn SQL: Practice SQL Queries
Learn SQL: SQL Query examples
Learn SQL: Create a report manually using SQL queries
Learn SQL: SQL Server date and time functions
Learn SQL: Create SQL Server reports using date and time functions
Learn SQL: SQL Server Pivot Tables
Learn SQL: SQL Server export to Excel
Learn SQL: Intro to SQL Server loops
Learn SQL: SQL Server Cursors
Learn SQL: SQL Best Practices for Deleting and Updating data
Learn SQL: Naming Conventions

Emil Drkusic
SQL commands, T-SQL

About Emil Drkusic

Emil is a database professional with 10+ years of experience in everything related to databases. During the years, he worked in the IT and finance industry and now works as a freelancer. His past and present engagements vary from database design and coding to teaching, consulting, and writing about databases. Also not to forget, BI, creating algorithms, chess, philately, 2 dogs, 2 cats, 1 wife, 1 baby... You can find him on LinkedIn View all posts by Emil Drkusic

4,686 Views