In this article, we will go over some of the most fundamental ways to filter data hosted in PostgreSQL.
Data is hosted in a variety of data repositories, one of which is relational databases. Out of tens of commercial and open-source relational databases, one of the most popular open-source relational databases is PostgreSQL. This database is offered on the Azure cloud platform through a service named Azure Database for PostgreSQL. One of the most fundamental operations performed on the database is reading and writing data to consume and host data. It goes without saying that when the data is consumed, it must be scoped based on the requirements or criteria specified by the consumer. This translates to filtering the data while querying it. Like every other relational database, Postgres offers different operators and options to filter data while querying. Let’s go ahead and learn some of the most fundamental ways to filter data hosted in PostgreSQL.
Creating PostgreSQL server on Azure
To perform the exercise shown below where we would be looking at different operators in PostgreSQL that filters data, the first thing we need in place is an instance of postgres. It is assumed the one has required access to Azure Database for PostgreSQL service and an instance of Postgres – Single Server edition or any other edition supported by this service has already been created. Once the instance is in place, we will be using the pgAdmin utility to query this database instance. One can use any editor of choice that works with postgres.
Once the database instance as well as editor is in place with successful connectivity, the database objects in the pgAdmin utility would look as shown below. Select the tables object category, right-click on it and select the Query Tool option as shown below. This will open the query window from where we will be executing SQL queries.
To query and filter data, we need to have at least one database object with some data in it. We can create a simple table with a few fields as shown below using the CREATE TABLE command. In this table structure, the id field is auto-increment and the rest of the fields can be provided as desired.
Once the table is created, we need to populate records or values in it. This can be easily done using the INSERT command. One thing we need to keep in view is that the ID field is of the SERIAL datatype, which means that it will auto-increment and we cannot dictate its values in the query while inserting data. So, we need to specify the schema as well while inserting data using the insert command. Once we insert a few records and then select the same using the SELECT command as shown below, we will have the data ready to be used with various filter operators and functions supported by postgresql. In this table structure, we have intentionally added city names in the country field to simulate data discrepancy, which is one of the use-cases used by analysts to filter out such data.
Filtering data in postgresql
One of the most heavily used mechanisms of filtering data is by using the WHERE clause. This clause supports a variety of conditions that can be combined to create from most simplistic to most complex criteria to filter data. If one knows the exact criteria to pinpoint a single record or a range of records, one can provide the exact value of the field in the criteria using the WHERE clause as shown below. Here we are filtering the data to retrieve the record where id is 1. One can use a variety of comparisons, conditional, logical, and different types of operators to form a criterion. Examples of such operators are =, >, <, AND, OR, NOT, etc.
When the criteria are not specific where we pinpoint the exact record, the next broader variety of criteria is to select from a range or enumeration of values. To filter data based on a collection of enumerated values, one common way is to use IN operator with the WHERE clause as shown below. Here we are selecting records that have ID value 1 or 3.
At times, when the data is queried with a filter criterion in the where clause, even after that, the query results are too huge. Typically, in web applications, data is shown in a paginated manner and the requirement is to fetch limited records page-wise. This can be facilitated using LIMIT and OFFSET as shown below. In this example, we are specifying that the query should return only two rows and it should bypass the first row or start from the second row.
In the previous example, we specified the criteria to return only a subset of the data. At times, the requirement is to fetch an exact number of rows where instead of specifying limits, we need to specify the exact number of rows to be fetched in the query results. In this case, we can use the FETCH clause as shown below. Here we are specifying that we intend to fetch only the first two rows from the result set irrespective of the number of rows in the resultset. FETCH clause is a more precise way of specifying the total rows in the resultset. LIMIT is a way to specify thresholds which is not as specific as the FETCH clause. We can also optionally specify the OFFSET option in the FETCH clause to specify the start position from where the result fetching should be started.
When the requirement is to filter data where the filter criteria are not known precisely, the general approach is to use ranges or patterns. In a range scenario, one may want to filter the data between two different thresholds. To query data between two thresholds or range of values, one can use the BETWEEN operator as shown below. Here we are filtering data between values 1 and 3 which would result in the selection of all the 3 rows as the id values are 1, 2 and, 3.
Specifying thresholds would mean that one knows the ranges of values based on which data must be filtered. Another use case is in which we know the pattern of data but not ranges or thresholds. For example, one may know the pattern of values in a specific type of field like email id, credit cards, SSN, Phone number, etc. In such cases, we tend to specify the pattern to use for filtering data. Below is one such example where we are filtering data based on a pattern using the LIKE operator. Here we are selecting records that have email ids that end with gmail.com.
In the above cases discussed so far, the criteria specification is not conditional. At times, we need to specify criteria that must be followed depending upon a condition. In such cases, we can use the CASE WHEN operator. In newer versions of postgresql, this operator has an alternative that is considered a more readable way of specifying conditional criteria to filter data. It’s known as the FILTER clause. In the below example, we intend to return a count of a total number of records that meet the filter criteria. FILTER clause is generally used to filter data in aggregate functions like sum, avg, etc.
In this way, we can use different filtering mechanisms like the universal WHERE clause, operators like BETWEEN, IN, and LIKE, a clause like FILTER, and query result scoping clauses like LIMIT and FETCH.
In this article, we created an instance of Azure Database for PostgreSQL and connected it to it using the pgAdmin utility. We created a basic table and populated it with some sample data. Then we used a variety of operators and clauses to learn the different SQL constructs that are available in PostgreSQL to filter data.
- Using psql tool with Azure Database for PostgreSQL - May 27, 2022
- Introduction to Azure Synapse Lake Database in Azure Synapse Analytics - May 23, 2022
- Import data using Python in Power BI - May 11, 2022