Ed Pollack

Implementing and Using Calendar Tables

March 24, 2017 by

Introduction

There is a common need in reporting to aggregate or return data that is crunched based on date attributes. These may include weekdays, holidays, quarters, or time of year. While any of this information can be calculated on the fly, a calendar table can save time, improve performance, and increase the consistency of data returned by our important reporting processes. In my previous article, you could learn about designing of a calendar table.

Implementing a Calendar Table

When the design process for a calendar table is complete, we can begin creating our date-related data. In this article, we will run through the remaining TSQL needed to create rows in Dim_Date, add holiday metrics, and demo a few uses of the data. As has been the theme thus far, creativity, flexibility, and customization are key to making this a success. Only use data elements you need, and feel free to add more as needed.

At this point, let’s create a row in Dim_Date for the current date being processed:

Here we insert a row into our calendar table, increment @Date_Counter, and go back to the start of our loop. Some columns are left NULL intentionally, as we will be populating them below. Holidays and business days can be calculated in a set-based fashion once all of our other data has been inserted into the calendar table.

Holiday calculations are completely up to you. Include whatever holidays are relevant, needed for reporting, or helpful in understanding business activity. If holidays are not needed then you may leave out those columns, as well as much of the remaining code in this article. For the examples below, we’ve included wide variety of holidays that are calculated in different ways. For example, President’s day is on the 3rd Monday in February, which can be determined using the month (2), day of week (Monday), and day-of-week-in-month (3). Alternatively, US independence day is simple as it can be determined using only the month (7) and day (4).

Note that each holiday definition not only designates and names a holiday, but adjusts Is_Business_Day accordingly. This allows us to determine for each holiday whether it is a business day (do nothing) or not (set it to zero). We also constrain holiday assignments to the date range provided in the stored procedure parameters.

Also note that these are all of the literal holidays as they are defined. Oftentimes, holidays that happen to fall on weekends will have an observed national holiday on Monday. This is not accounted for here, but could easily be adjusted for later on if needed. For example, we could search the table for any instances of Christmas that fall on Saturday or Sunday and immediately change the holiday to be on Monday, and append “Observed” to the holiday name. It’s an extra step that could be applied in a set-based fashion to any group of holidays, incrementing 2 days if Saturday and 1 day if Sunday.

With holidays and business days defined, we can use (somewhat messy) common table expressions to determine the previous and next business days. These calculations can be very useful in understanding how business reacts to the start or end of a streak of business days or non-business days. For example, does work pile up over long weekends, resulting in an influx of business on the next work day?

Holiday seasons are another consideration that may be useful when determining how business reacts to the lead-up to a holiday, a season, or proximity to a holiday. For this example, we’ll populate the Christmas holiday season:

In this TSQL, we define Thanksgiving and use that date to create a sequence of dates between it and Christmas, which becomes our holiday season. Seasons can vary widely based on business needs, encompassing times of year, holidays, weather, or special events. If desired, we could add and populate a day of season column, in order to track how many days into a given season we are (or how many remain).

That’s the end of the proc! Now, let’s test it out:

This takes about seven seconds to run and generates 5480 rows of date data for our use. The results look like this:

All scripts in this article are attached at the end to allow for easy tinkering. Feel free to download and modify to your heart’s content!

Customization

A calendar table can be customized and adjusted to meet your business needs. If ¾ of these columns are unnecessary, then feel free to remove them. If there are additional metrics that are useful to your applications or reports, then feel free to add them! If a piece of data can be derived from a date, then it might be useful here.

The purpose of a calendar table is to improve the efficiency of reports or processes that require calculations to be made based on date components. In addition, we centralize the calendar data, which ensures consistency across any code that requires it. This is preferable to calculating these metrics on-the-fly every time we need them.

Performance

In our examples, we generated quite a few different columns, but you’re free to use as many or as few as makes sense for your application. This is a scenario in which there is no need to fear a wide table as most metrics will be relatively small (such as BIT, DATE, or TINYINT). Additionally, we have total control over the row count in a calendar table. For example, 75 years of data would result in a relatively lean 365 * 75 + 19 (27,394) rows of data (one row per date).

This is a reporting table that we can add more rows to later at a future time. That being said, if we intentionally only include a few years, we need to remember to add more in the future, or reports may stop functioning correctly. In addition, for forecasting and predictive analytics, we may need to span many years in the future, even if our current data set does not extend that far. Of course, if we know for certain that we will never need data before or after any given date, then by all means include only the data that is needed.

With regards to data types, be sure to choose the smallest possible types for a given column. DATE is preferable to DATETIME, BIT smaller than INT, and consider using TINYINT or SMALLINT instead of INT. For example, the day number within a given year will never be larger than 366 or smaller than 1, therefore a SMALLINT is more than enough for this column.

Using a Calendar Table in Reporting

Using a calendar table optimally requires that we add a date key to reporting tables, index, and join the calendar table on that column. To test this, let’s use Sales.SalesOrderHeader in AdventureWorks:

This creates a date key for joining om SalesOrderHeader and indexes it.

This returns no results (!?):

The calendar data we created spanned 2015 – 2030, but the data in SalesOrderHeader is older. As a result, our INNER JOIN returns no matches and we get nothing back. To resolve this, we’ll add more data to our calendar table:

This runs in about a second, and when we rerun our SELECT, we get results:

Now, let’s try out a few queries:

This tells us which territories were responsible for the most sales on Christmas, ordered by the most on top and fewest on the bottom:

This query groups orders by month, so we can see which months have the most and fewest orders:

These may seem like simple examples, but many reporting and analytics programs expect simple dimensions for consumption. Having the month number or day name is often required in order to perform any calculations using those columns. If we decide to take things further and crunch more complex uses of date components, then having the dimensions pre-calculated will save immense time and ensure that work is possible, let along efficient.

Conclusion

Calendar tables are extremely useful in any reporting, analytics, or even OLTP use case in which we need to frequently join data on data-related attributes. Not only can they greatly improve performance, but they simplify our code and allow reporting engines to consume that data with ease. As a bonus, we gain maintainability as we can retain a single copy of calendar data in one place. This reduces the likelihood of coding mistakes when operating on date data, especially when the calculations are complex.

One area of calendar tables that was intentionally omitted from this was alternate calendars. There are many uses for the ISO, 4-5-4, and a variety of fiscal calendars that can be joined into our standard calendar data. We’ll tackle this in a future article and build on the work we have completed here!

Other articles in this series:

References and Further Reading

You can download all scripts in this article here.

Always check your results and make sure that calendar data is correct. It’s easy to update and replace, so do not hesitate to apply sufficient scrutiny as this data is intended to be used in many places.


Ed Pollack
General database design

About Ed Pollack

Ed has 20 years of experience in database and systems administration, developing a passion for performance optimization, database design, and making things go faster. He has spoken at many SQL Saturdays, 24 Hours of PASS, and PASS Summit. This lead him to organize SQL Saturday Albany, which has become an annual event for New York’s Capital Region. In his free time, Ed enjoys video games, sci-fi & fantasy, traveling, and being as big of a geek as his friends will tolerate. View all posts by Ed Pollack

168 Views