Thomas LeBlanc

SQL Server Data Warehouse design best practice for Analysis Services (SSAS)

April 4, 2017 by

Before jumping into creating a cube or tabular model in Analysis Service, the database used as source data should be well structured using best practices for data modeling. Some might say use Dimensional Modeling or Inmon’s data warehouse concepts while others say go with the future, Data Vault. No matter what conceptual path is taken, the tables can be well structured with the proper data types, sizes and constraints.

This article is going to use a scaled down example of the Adventure Works Data Warehouse. The full version of the Adventure Works DW can be downloaded from CodePlex.

Taking on a complete data warehouse project can be overwhelming. The best path from experience is to create the minimum structure that yields a good result. Not the best, but something of value where management will approve and extend other data warehouse projects.

The beginning can be a data mart that might answer the following questions:

  1. Compare Sales of Mountain Bikes for years 2015 and 2016
  2. What happens to profits when you remove the Bike Category?
  3. Show average sales for subcategories within categories in the USA?
  4. Calculate Gross Profit Margin for Internet Sales region by Year, Quarter and Month


Figure 1 Gross Profit Pivot Table

From these specific inquiries, we can get an idea of the data required. There will be a Data dimension because of the need for Year, Quarter and Month. The main numerical value is sales, but there is a calculation for Gross Profit Margin. Sales Line Item Cost is required to compute Gross profit, then Gross Profit Margin.

There is reference to a Category value of Bike and slicing into Subcategories. Products for Sales Line Items will be relate to Subcategory and Category. Average Sales will be another calculation and is sliced by Country.

The Country is not specific, so the question where is the Country located, from Customers or from Sales Territories. The Reseller Sales does not have a different customer for this company but Internet Sales does. So, Sales Territory and Geography will get the country along with Group and Region.

Figure 2 shows a business database diagram of the full Adventure Works DW with just the fact table for Internet Sales.


Figure 2 Internet Sales Data Mart

There are a lot of columns that can be very confusing. Figure 3 scales down the model to something simpler.


Figure 3 Simpler Internet Sales Data Mart

Since all the dimension tables have IDs as a primary key, the use of the integer data type needs to be looked at. There are different uses like bigint, int, smallint and tinyint. You could even use Boolean if there was only 2 possible integer values – 0 or 1. This design uses an int data type for the Date dimension key because it uses values like 20170124, 20170125, etc.

The Category and Subcategory dimensions can use tinyint because there are less than 255 possible rows. Smallint would work for Customer because there will not be more than 32,000 rows. The int data type will allow 2,000,000+ rows if you use all positive values. You can also use negative values which work well when you have early arriving facts.

The fact table will have these ID columns as foreign keys, and because the fact table grows daily, reducing the size used for these integer type IDs will help keep the table in check as size can be a problem.

Figure 4 Fact Table for Internet Sales

Figure 4 show the creation statement for the Internet Sales fact table. The SalesOrderNumber was originally a NVARCHAR, but it is known to only contain numeric text data with maybe some leading zeroes. The case for using NVARCHAR over VARCHAR usually depends on international sales. The system might need to use multiple accounting systems to be combined into one data warehouse. If this is the case, use NVARCHAR.

The numerical fields are using MONEY data, but SMALLMONEY could be used for some of them. Since they are dollar values, there is no need to use decimal or numeric were the number of decimal places need to be specified.

Once the fact table is created, foreign keys are created to relate to the dimension tables. Figure 5 shows the foreign create statements for the Internet Sales fact table.

Figure 5 Foreign Keys

Some might say that the ETL should enforce foreign key constraints. If that is the case, then there will be no problem adding the foreign keys to the database.

There are cases of early arriving facts. This indicates that the fact arrives before the dimension value when merging data from separate systems. The ETL for the fact table population will need to have intelligence to add the missing dimension first, then use the ID for the fact row, and later update the new temporary dimension row with the late arriving dimension data.

The scaling down of the first data mart will make creating a new model must easier to get a start on a new data warehouse project. The thought to include more floods the mind. Try to put those ideas in a reminder for the second interaction of the project. Remember to check the data types and not be afraid with a more challenging path.

Note

The Relationship between Product, Subcategory and Category can be folded into the Product table in this example. That would mean the Subcategory Name and Category Name would become columns in the Product table. In his case, that is ok and some would say it removes the snowflake in the schema so the model becomes a star schema.

But, in other cases, like Sales Territory, there is a different table between the territories and the fact table. The Reseller Sales needs to go through Reseller dimension while the Internet Sales needs to go through the Customer dimension. Both use the Geography dimension to get the country with links to the Sales Territory dimension.

Next articles in this series:

Useful links


Thomas LeBlanc
Analysis Services (SSAS), Data Warehouse

About Thomas LeBlanc

Thomas LeBlanc is a Data Warehouse Architect in Baton Rouge, LA. Today, he works with designing Dimensional Models in the financial area while using Integration (SSIS) and Analysis Services (SSAS) for development and SSRS & Power BI for reporting. Starting as a developer in COBOL while at LSU, he has been a developer, tester, project manager, team lead as well as a software trainer writing documentation. Involvement in the SQL Server community includes speaking at SQLPASS.org Summits and SQLSaturday since 2011 and has been a speaker at IT/Dev Connections and Live! 360. Currently, he is the Chair of the PASS Excel Business Intelligence Virtual Chapter and worked on the Nomination Committee for PASS Board of Directors for 2016. View all posts by Thomas LeBlanc

168 Views