Hadi Fadlallah
This image shows the for loop container used to process the OLAP cube partitions

An efficient approach to process a SSAS multidimensional OLAP cube

August 15, 2019 by

Introduction

While building and deploying an SSAS OLAP cube, there are two processing orders that you can choose from when you create a process operation:

  • Parallel (All Objects will be processed in a single transaction): Used for batch processing, all tasks run in parallel inside one transaction.
  • Sequential (Transaction mode):
    • One Transaction: All Tasks are executed in one transaction
    • Separate Transactions: Every Task is executed in one transaction

As mentioned above, when choosing a parallel processing order, tasks are processed in parallel (specific degree of parallelism) but if one task fails all operations are rolled back, even if it was the last one. This may lead to a serious problem when processing a huge OLAP cube that contains lots of partitions.

In a similar case, using a sequential processing order will decrease the processing performance since tasks are processed sequentially which delays delivery.

One of the best workarounds would be processing partitions using a parallel option but in batches; each process operation (batch) will contain a specific number of partitions (tasks). In case that an error occurs during processing phase, only one batch tasks are rolled back.

I proposed this approach for the first time as a solution to a similar problem on stackoverflow.com. Since it addresses a wide audience, I decided to improve it and write it as a separate article.

To learn to build a cube from scratch using SSAS, I would recommend you to run through this interesting article, How to build a cube from scratch using SQL Server Analysis Services (SSAS).

This article contains a step-by-step guide for implementing this processing logic within a SQL Server Integration Services package.

Prerequisites

In order to implement this process, you have to make sure that SQL Server Data Tools are installed to be able to create Integration Services packages and to use the Analysis Services tasks. More information can be found at Download and install SQL Server Data Tools (SSDT) for Visual Studio.

Also, the OLAP cube must be created and deployed (without processing dimensions and cube).

Creating a package and preparing the environment

Adding variables

First, we have to create a new Integration Services project using Visual Studio, and then we have to create all the needed variables as described below:

Variable name

Data type

Description

inCount

Int32

Stores unprocessed partitions count

intCurrent

Int32

Used within for loop container

p_Cube

String

The OLAP cube object id

p_Database

String

The SSAS database id

p_MaxParallel

Int32

Degree of parallelism

p_MeasureGroup

String

The Measure Group object id

p_ServerName

String

The Analysis Server Instance Name

strProcessData

String

Stores XMLA to process partitions Data

strProcessIndexes

String

Stores XMLA to process partitions indexes

strProcessDimensions

String

Stores XMLA to process dimensions

The following image shows the Variables Tab in Visual Studio.

This image shows the variables added in Visual Studio

Figure 1 – Variables needed

Note that, all variables names starting with “p_” can be considered as parameters

Adding Connection Managers

After defining variables, we must create an OLE DB connection manager in order to connect to the SQL Server Analysis Service Instance:

  1. First, we must open the connection manager and configure it manually:

This image shows the OLE DB connection manager used to connect to the SSAS analysis services database /  OLAP Cube

Figure 2 – OLE DB Connection manager configuration

  1. Then we must open the Expression editor form from the properties Tab and set the Server name expression to @[User::p_ServerName] and Initial Catalog expression to @[User::p_Database] as shown in the image below:

This image shows the OLE DB connection manager property expressions editor

Figure 3 – Connection Manager Expressions editor form

  1. Rename the OLE DB connection manager to “ssas”:

This image shows the Connection managers Tab in Visual Studio which contains the OLAP cube connection manager

Figure 4 – Connection managers Tab

Processing Dimensions

In order to process dimensions, we must use a Sequence Container to isolate the dimension processing within the package. Then we must add a Script Task to prepare the processing commands and an Analysis Services Processing Task to execute them:

This image shows the SSIS Toolbox with Script Task and Analysis Services Processing Task highlighted

Figure 5 – Picking Tasks from SSIS Toolbox

This image shows the Process dimensions sequence container which contains the script task and Analysis Services Processing Task

Figure 6 – Process Dimensions sequence container

In the Script Task configuration form, we must select @[User::p_Database], @[User::p_MaxParallel] as ReadOnly Variables and @[User::strProcessDimensions] as ReadWrite variable as shown in the image below:

This image shows the First Script Task configuration

Figure 7 – First Script Task configuration

Now, Open the Script editor and use the following code (C#):

The following code is to prepare the XMLA command to process the dimensions. We used AMO libraries to read the SSAS database objects, loop over the dimensions and generate the XMLA query to be used in the Analysis Services Processing Task:

After configuring the Script Task, we have to open the Analysis Services Processing Task and to define any valid task manually, then from the Properties Tab we must open the expressions editor form and set the ProcessingCommands property to @[User::strProcessDimensions] variable as shown in the image below:

This image shows the Analysis Services Processing Task property Expressions editor

Figure 8 – Analysis Services Processing Task expressions form

Get the unprocessed partitions count

In order to process partitions in batches, we must first to get the unprocessed partitions count in a measure group. This can be done using a Script Task. Open the Script Task configuration form, and select @[User::p_Cube], @[User::p_Database], @[User::p_MeasureGroup] , @[User::p_ServerName] variables as ReadOnly variables and @[User::intCount] as a ReadWrite variable as shown in the image below:

This image shows the second Script Task configuration form

Figure 9 – Second Script Task configuration

Open the Script Editor and write the following C# script:

This script reads the SSAS database objects using AMO libraries, and retrieves the number of unprocessed partitions within the OLAP cube Measure group, then stores this value within a variable to be used later.

Process Partitions in batches

Finally, we have to create a For loop container to loop over OLAP cube partitions in chunks. In the For Loop Editor, make sure to set the For Loop Properties as following:

  • InitExpression: @intCurrent = 0
  • EvalExpression: @intCurrent < @intCount
  • AssignExpression: @intCurrent + @p_MaxParallel

Make sure the For Loop Editor form looks like the following image:

This image shows the for loop container configuration with expressions highlighted

Figure 10 – For Loop configuration

Within the For Loop container, add a Script Task to prepare the XMLA commands needed to process the Data and indexes of the partitions, and add two Analysis Services Processing Task to execute these commands as shown in the image below:

This image shows the for loop container used to process the OLAP cube partitions

Figure 11 – For loop container

Open the Script Task configuration form and select @[User::p_Cube], @[User::p_Database], @[User::p_MaxParallel], @[User::p_MeasureGroup] as ReadOnly Variables, and select @[User::strProcessData], @[User::strProcessIndexes] as ReadWrite Variables. The Script Task Editor should looks like the following image:

This image shows the third Script Task configuration form

Figure 12 – Third Script Task configuration

In the script editor, write the following script:

The Script is to prepare the XMLA commands needed to process the partitions Data and Indexes separately. In this Script, we use AMO libraries to read SSAS database objects, loop over OLAP cube partitions and to generate two XMLA query that executes n partitions in parallel (10 in this example) as a single batch (one query for processing data and another one to process indexes). Then we store each XML query within a variable to be used in the SSAS processing task.

Now Open both Analysis Services Processing Task and define any valid task manually (just to validate the task). , then from the Properties Tab we must open the expressions editor form and set the ProcessingCommands property to @[User::strProcessData] variable in the First Task and @[User::strProcessIndexes] in the second one.

The package control flow should looks like the following:

This image is a screenshot of SSIS package control flow

Figure 13 – Control flow overview

Now, the package is ready. If an error occurs during processing, only the current batch will rollback.

Disadvantages and Possible improvements

One of the most critical disadvantages of this approach is that it can lead to inconsistent values if the OLAP cube is online and not all partitions are processed. Therefore, it has to be executed in off-hours or on a separate server then deployed to the production server after being processed.

Besides this, many improvements can be done for this process:

  1. We can configure some logging tasks to track the package progress, especially when dealing with a huge number of partitions
  2. This example is to process one measure group partitions; it can be expanded to manipulate all measure groups defined in the OLAP cube. To do that, we need to add a Script Task to get all measure groups in the SSAS database, then we should add a Foreach Loop container (Variable enumerator) to loop over measure groups and put the For loop container we already created within it

Some helpful Links

In the end, I will provide some external links that contain additional information that can serve to improve this solution:

Hadi Fadlallah
Analysis Services (SSAS), Integration Services (SSIS), OLAP cube

About Hadi Fadlallah

Hadi is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com . Also, he published several article series about Biml, SSIS features, Search engines, Hadoop, and many other technologies. Besides working with SQL Server, he worked with different data technologies such as NoSQL databases, Hadoop, Apache Spark. He is a MongoDB, Neo4j, and ArangoDB certified professional. On the academic level, Hadi holds two master's degrees in computer science and business computing. Currently, he is a Ph.D. candidate in data science focusing on Big Data quality assessment techniques. Hadi really enjoys learning new things everyday and sharing his knowledge. You can reach him on his personal website. View all posts by Hadi Fadlallah

168 Views