Ed Pollack

Designing a Calendar Table

March 22, 2017 by

Introduction

There is a common need in reporting to aggregate or return data that is crunched based on date attributes. These may include weekdays, holidays, quarters, or time of year. While any of this information can be calculated on the fly, a calendar table can save time, improve performance, and increase the consistency of data returned by our important reporting processes.

What is a Calendar Table and Why is it Useful?

A calendar table is a permanent table containing a list of dates and various components of those dates. These may be the result of DATEPART operations, time of year, holiday analysis, or any other creative operations we can think of.

The primary key of this table will always be the date, or some easy-to-use representation of that date. Each subsequent column will be an attribute of that date, where the types and size of those columns can vary greatly. At first glance, it may seem that a table such as this would be superfluous, and that this data is easy to generate, but oftentimes as our reporting needs become complex, so does the creation, maintenance, and usage of this data.

As a result, storing calendar data in a permanent location can be an easy solution. Here are some reasons why this data is useful and why storing it in a dedicated table can be a great decision:

  • Data is relatively easy to generate and requires little maintenance once created.
  • Calendar data can be used to service any reports that require it, removing the need to recreate it in each report.
  • We can implement a large number of calendar metrics, and can easily add more as needed.
  • Calendar data is tiny in terms of space used. Performance against this data is generally quite fast.
  • Complex reports can be simplified by removing commonly used DATEPART and DATEADD computations.
  • Important business logic, such as holidays, can be centralized and maintained in a single location.
  • Maintaining calendar data in a single table ensures we do not encounter inconsistencies between different reports, reporting systems, or applications that need it.

There are many different reasons why a calendar table can be useful—this article is our opportunity to create one from scratch, populate it with data, analyze it, and put it to good use!

Implementing a Calendar Table

Our first step is to identify and define metrics that we want to collect. This is where we should look at our reporting needs and determine what sorts of date-related calculations we perform on a regular basis. For our examples here, we will include 33 different metrics (plus the calendar date itself), though you are free to add more as needed. Once introduced, we’ll walk through how to populate this data, and then how to use it.

Date Parts

The simplest metrics are basic date components, including:

  • Calendar Month: The numeric representation of the month, a number from 1-12.
  • Calendar Day: The numeric representation of the calendar day, a number from 1-31. The maximum value depends on the month and on whether it is a leap year.
  • Calendar Year: The numeric representation of the year, such as 1979 or 2017.
  • Calendar Quarter: The numeric representation of the quarter, a number from 1-4.
  • Day Name: The common name for the day of the week, such as Tuesday or Saturday.
  • Day of Week: The numeric representation of the day of the week, a number from 1(Sunday) through 7 (Saturday). In some countries the number 1 is used to represent Monday, though here we will use Sunday for this calculation.
  • Month Name: The common name for the month, such as February or October.

These are all bits and pieces of the date itself and are useful whenever we are looking to find out metrics on specific days of the week, fiscal quarters, or other date parts.

Relative Points in Time

Knowing when a date is, with respect to other calendar metrics, can be very handy in understanding how business changes over time. The following metrics allow you to determine in what part of the week, month, or year a date occurs:

  • Day of Week in Month: The occurrence of a day of week within the current month. Ie: The third Thursday of the current month.
  • Day of Week in Year: The occurrence of a day of week within the current year. Ie: The seventeenth Monday of the current year.
  • Day of Week in Quarter: The occurrence of a day of week within the current quarter. Ie: The seventh Saturday of the current quarter.
  • Day of Quarter: The day number within the current quarter.
  • Day of Year: The day number out of the current year.
  • Week of Month: The week number within the current month. With this calculation, the weeks count starting on the first of the month, regardless of the day of week.
  • Week of Quarter: The week number within the current quarter.
  • Week of Year: The week number within the current year.
  • First Date of Week: The start date of the week. Sunday is assumed here, but could be defined differently.
  • Last Date of Week: The end date of the week. Saturday is assumed here, but could be defined differently.
  • First Date of Month: The first date of the current month.
  • Last Date of Month: The last date of the current month.
  • First Date of Quarter: The first date of the current quarter.
  • Last Date of Quarter: The last date of the current quarter.
  • First Date of Year: The first date of the current year.
  • Last Date of Year: The last date of the current year.

These metrics allow us to easily determine when holidays or special dates occur. In addition, they can be used to assist with special processes that occur during specific weeks, such as tax preparation, invoicing, or other routing operations.

The “Day of Week of…” metrics are measures of how many of a given day have occurred thus far in a time period. The following illustration shows this metric for Wednesdays in March, 2017:

The “Day of Week of Month” for the 1st is 1, the 8th is 2, the 15th is 3, and so on. The similar metrics for quarter and year are calculated in the same fashion, by asking, “So far this year, how many Mondays have we had”, assuming the date we are looking at is a Monday.

Knowing boundaries, such as the start and end of weeks can allow for year-over-year trending by week, or for quick data crunching by specific periods in time.

Holidays and Business Days

Many business care about holidays, holiday seasons, and when business days occur. This can influence load on software systems, allocation of resources, employee coverage, and financial trending. Including some data that describes when holidays occur and what they are can greatly assist in this sort of analysis, removing the need for complex, ad-hoc reporting at a later time.

  • Is Holiday? A bit that indicates if a given date is a holiday or not.
  • Is Holiday Season? A bit that indicates if a given date is part of a holiday season or not.
  • Holiday Name: Indicates the name of the holiday, if applicable.
  • Holiday Season Name: Indicates the name of the holiday season, if applicable.
  • Is Weekday? A bit that indicates if a given day is a weekday, typically Monday-Friday.
  • Is Business Day? A bit that combines weekday and holiday data to determine if it is a business/work day.
  • Previous Business Day: This is the immediately preceding business day.
  • Next Business Day: This is the immediately following business day.

Business days can vary greatly across different industries. A week off for some may equate to the busiest week of the year for others. Summer vacation for teachers could be prime time for an outdoor amusement park. Being able to enumerate all of these rules into a handful of simple bits can make metrics-gathering significantly faster and easier!

Miscellaneous Metrics

We could come up with endless lists of date-related metrics, but for those that didn’t fit into other categories, here are a few examples that could be handy:

  • Is Leap Year: Is the current date is contained within a leap year, then this bit would be set to 1.
  • Days in Month: Contains the number of days in the current month.
  • Calendar Date String: A representation of the date in a string, delimited by forward slashes. This is useful for quickly displaying a more familiar string-based date format to the user. Other formats can be stored, too.

The purpose of a calendar table is to improve the speed, accuracy, and ease of reporting. Many of these metrics could easily be calculated on the fly as a report is run, but over time, the need for more complex metrics would make this incredibly messy. Even getting the days in a month could be a nuisance as a CASE statement including details for leap years would be cumbersome.

Create the Dim_Date Table

Our first step is to create a calendar table for use in all of our subsequent examples:

There are two ways to approach the structure of this data. One is to create computed columns that automatically populate based on the date, and the other is to populate all via script instead. I have chosen to allow for manual population with the rationale being:

  1. Even if we toss 100 years of data into the date table, performance will be good enough that it will populate in approximately a few minutes.
  2. Being able to use metrics immediately for further calculations Is immensely useful, simplifying code & maintenance. This is especially handy for more complex calculations involving calendars and holidays.
  3. Populating a calendar table is a one-time affair. Once complete, the data is available indefinitely.

Also note that the primary key on the table is a DATE. Integers have often been used as the primary key on calendar tables in older versions of SQL Server due to the lack of an appropriate DATE data type. Not only is DATE more intuitive and easier to use, but it only consumes 3 bytes instead of 4 for an INT. In theory, a SMALLINT (2 bytes) could be used, but there is some risk of overflow if we decide to trend far into the past or into the future. If for any reason (backward compatibility, etc…) you needed to use an integer, augmenting this process to do so would not be difficult.

Once created, we can move forward with creating a stored procedure that can be used to insert data into this table.

Populating a Calendar Table

To keep our stored proc simple, we’ll have only 2 parameters: A start and end date. Any existing data in this date range will be deleted and repopulated in its entirety. This allows for easy recreation of data in the event that any changes are made to this process, such as defining holidays or business days.

With the proc declaration out of the way, we can quickly check for a few anomalous conditions that we would want to alert on:

These validations ensure that we are not passing in NULL data or end dates that occur prior to start dates. The error is a nudge on the shoulder instead of throwing a RAISERROR, as this is more of a maintenance utility than an ongoing process.

The next step is to remove any data from Dim_Date that falls within the range specified:

While we could leave data alone or update it, I found that it was convenient when testing to be able to completely destroy existing data and recreate it. Since the amount of time needed to do this is small, the cost was more than worth the convenience.

In order to accurately generate our data, I chose a methodical approach in which we declare variables, assign them one at a time, and then insert a row into Dim_Date when complete. The iterative approach is typically going to operate slower than a set-based approach, but the ability to use scalar variables to do so helps keep that runtime acceptable. A set-based approach could be devised with a numbers table or set of CTEs in order to reduce runtime, but the trade-off would be complexity and contention.

The long laundry list of local variables is as follows:

There is a variable for nearly every column, allowing calculations to be kept simple. We also benefit from being able to use any calculations in subsequent operations. @Date_Counter will iterate from @Start_Date through @End_date, as we insert rows into our calendar table.

From this point on, we enter a loop, assign values to the variables, and then insert a row into Dim_Date. The following are all of these calculations, along with a description of what they mean:

These seven variables all represent date parts and can be captured using DATEPART and some simple TSQL. Once we have these values, we can derive additional information about the date:

Capturing a string version of the date is handy for quick & reliable display in a specific format. The format MM/DD/YYYY is used above, but could very easily be tinkered with in order to display using other orderings or delimiters. CONVERT may be also used to format the date in a variety of forms. For example, consider the following TSQL:

The results show a bunch of ways that we can take a date and reformat it based on local, international, or stylistic formats:

Weekdays are defined here as all days of the week except Saturday & Sunday (the weekend). In some countries or businesses, this may be defined differently, and can easily be tweaked for those purposes. Business days will be defined as all weekdays that are also not holidays. We will handle holidays near the end of our stored procedure, but will set this equal to @Is_Weekday for now, as a convenience for our future calculations where we will set this equal to zero for any holidays identified.

Month and day names are provided as a convenience and can be pulled in order to quickly display date-related strings in a variety of formats.

These statements perform a bit of more complex date math in order to determine the position of the date within its month, quarter, or year. The basis of these calculations is to calculate the start of a given year, week, or month, and then subtract the days or weeks from that point in time and the date being calculated. To emphasize the results, though, and their relative simplicity, the following TSQL results show the date, day of week, and day of week within the month for the start of 2017:

The results show that we are starting at the beginning of the month and breaking it into sets of seven. The same pattern holds true for the quarterly and yearly metrics.

Determining the first and last dates in a given week, month, quarter, or year are similar calculations to positions in time. To figure out the first date in a period, we determine how far into the period we are and subtract that number of days. To determine the last date in a period, we do the opposite, subtracting the current date position from the period length. @Last_Date_of_Month can be determined with a built-in function, EOMONTH, which saves us a bit of work on that calculation.

We can figure out the count of how many given days have occurred up to (and including) the date in question by taking the day number in the period, “rounding up” and dividing by seven (without a remainder). This breaks the period into weekly chunks, allowing us to understand where in the total period the current date falls.

The first BIT column determines if it is a leap year, which can help in knowing how many days are in the year, are in February, or if a leap year child actually gets a birthday in a given year. The second column uses the calendar month and that leap year bit to determine the number of days in the month, which prevents the need to figure it out later.

What is Next?

At this point we have defined all of the data elements we are looking for that can easily be determined up front, picked out data types, and completed those calculations. If you have any additional metrics to add, this is the place to do it. The process to do so is simple:

  1. Add a new column to Dim_Date
  2. Create a new local variable for the metric to be added.
  3. Add a new calculation to assign the appropriate value into that variable.

Likewise, removing a column is the opposite process: Remove the column in Dim_Date, the local variable, and the subsequent calculation.

Our next steps are:

  1. Turn the variables above into a row that we insert into Dim_Date.
  2. Use set-based calculations to add holiday metrics to our data.
  3. Run tests of the stored procedure in order to validate the results.

Conclusion

Calendar tables are extremely useful in any reporting, analytics, or even OLTP use case in which we need to frequently join data on data-related attributes. Not only can they greatly improve performance, but they simplify our code and allow reporting engines to consume that data with ease. As a bonus, we gain maintainability as we can retain a single copy of calendar data in one place. This reduces the likelihood of coding mistakes when operating on date data, especially when the calculations are complex.

Design a calendar table based on the needs of your application and add, remove, or adjust columns as needed. The metrics that matter most to one industry may be irrelevant to another. Completeness is key when creating a structure such as this, though. Consider what metrics you will need, both now and in the future. Also consider how much data you’ll need. Determine the minimum and maximum dates you’re going to need, and be ready to add or remove any when business needs change.

Flexibility, performance, and maintainability are the primary gains to be had when using a calendar table. If you think of any interesting ideas that have not been mentioned here, feel free to share!

Next articles in this series:

References and Further Reading

Always check your results and make sure that calendar data is correct. It’s easy to update and replace, so do not hesitate to apply sufficient scrutiny as this data is intended to be used in many places.


Ed Pollack
General database design

About Ed Pollack

Ed has 20 years of experience in database and systems administration, developing a passion for performance optimization, database design, and making things go faster. He has spoken at many SQL Saturdays, 24 Hours of PASS, and PASS Summit. This lead him to organize SQL Saturday Albany, which has become an annual event for New York’s Capital Region. In his free time, Ed enjoys video games, sci-fi & fantasy, traveling, and being as big of a geek as his friends will tolerate. View all posts by Ed Pollack

168 Views