In the article, SQL Server CHECKPOINT, Lazy Writer, Eager Writer and Dirty Pages in SQL Server, we talked about the CHECKPOINT process for SQL Server databases. This article is about CHECKPOINT in SSIS package.
- Note: Before reading this article, let me just tell you that CHECKPOINT in SSIS package and SQL Server CHECKPOINT are entirely different terms and are not related in any way.
Suppose we have an integration service package that does lots of data transformations and data insertion. The SSIS package fails, and you identify the error following the article Overview of SSIS Package Logging. You fix the error and rerun the package.
Suppose we have 10 steps in the package and it failed after completing 7 steps. If we restart the package, it starts from step 1 and runs the whole package. Assume that overall package execution takes 5 hours to complete and out of which step 1 to 6 takes most of the time approx. 4 hours. Steps 7 to 10 only takes 1 hour. The package failed at step 7 that shows we have already executed it for approximately 4 hours and due to an error, we need to again wait for approximately 5 hours to finish it. It is a critical package, and we cannot afford such failures. In case of failure also, we want to restart the package from the point of failure.
Do you think we can restart SSIS package from the point of failure instead of running complete package?
The answer is yes, we can use SSIS CHECKPOINT to do this task for us. Let’s explore this term in the upcoming section.
Overview of CHECKPOINT in SSIS package
We can configure a CHECKPOINT file in the SSIS package to log package execution information in it. If the package execution fails, SSIS uses the information in the checkpoint file to restart it from the point of failure. Once the package successfully executes, it removes the checkpoint file. On the next package execution, it does not find the checkpoint file and starts the package execution from stretch and creates a new checkpoint file.
We can use CHECKPOINT in the SSIS package to restart the failed package, but it works on the control flow level. We cannot configure a checkpoint to start the package at the data flow level.
- Note: This article assumes you have a basic understanding of SSIS packages. If not, please go through SSIS articles on SQLShack.
I have an SSIS package for my test environment, and it contains the following tasks in the Control flow:
- Data Flow Task
- Execute SQL task to update a few records
The data flow task contains the following source to destination data transfer tasks.
We have not done the configuration of CHECKPOINT in SSIS package. Let’s execute this package, and it should fail at Update Records task.
As expected, the SSIS package fails.
The data flow task is completed successfully, and you can view that 6 rows transferred from source to destination table.
Restart the SSIS package (press Ctrl+Shift+F5), and it starts the package again from the beginning. You get a similar output for this package once again. As the data flow task is executed again, we get duplicate data in the destination table.
We can avoid this situation using CHECKPOINT in SSIS package.
Configuring Checkpoints in SSIS
In the SSIS package, go to package properties using the menu bar (View -> Properties). You can also use the shortcut key F4 to open the following properties window.
In these properties, you can see a tab Checkpoints with following configurations:
- CheckpointFileName: It is the file in which the SSIS package will log package execution information. You can click on it and browse to the destination path or paste the folder location along with file name in XML format. It should be a valid directory, and you can use a network UNC path as well, but the appropriate permissions should be set else you get an error message that the package could not access the checkpoint file
- CheckpointUsage: It gives the following checkpoint file usage options
- Never: SSIS package will never use the checkpoint feature. It is the default option
- IfExists: In this option, the SSIS package will read the information in the checkpoint file if it is already there. If not, it will skip this and restart the package. We should use this option in most cases
- Always: We can configure the package to Always use checkpoint information from the checkpoint file. If the information is not available, the SSIS package will throw an error message
- SaveCheckpoints: We can choose to save the checkpoint information or not
Let’s do the following CHECKPOINT in SSIS package configuration:
- Checkpointusage – IfExists
- SaveCheckpoints: True
Execute the package, and it will fail against, but we do not see the checkpoint file in the configured path. In the Execution Results tab, we get the following message.
- Warning: This task or container has failed, but because FailPackageOnFailure property is FALSE, the package will continue. This warning is posted when the SaveCheckpoints property of the package is set to TRUE, and the task or container fails.
We are also required to configure a property FailPackgaeOnFailure for all control flow tasks. Select the control flow tasks and change the value of FailPackgaeOnFailure to True.
Execute the package, and you can see a checkpoint file in the configured path. Open the file, and you can see the content of this file as follows.
Fix the error message and rerun the package. It should only run the failed task. Package executes successfully, and you can note that it only executed the Update Records task. The data flow task was successful previously. Therefore, you do not see any status for the Data flow task because it is not executed this time.
Go back to the location of CHECKPOINT in SSIS package file, and you will not find the file. The SSIS package removes this file once the package execution completes successfully.
Let’s look at one more failure example. I modified my SSIS package to include the following in the data flow task. It includes two tasks for transferring data from source to destination tables.
I also modified the control flow task to execute the Update Record task first and then it should run the data flow task. You can see the following modified SSIS package.
Once we execute the package, it should fail one of the data flow tasks. We already have CHECKPOINT in SSIS package configuration for this package.
Double-click on this data flow task, and it shows the second task as failed.
You can also see a checkpoint file, and it has the following contents in it.
In the Progress bar, we also see information about the checkpoint file.
Fix the error and execute the package again. Before the execution, what do you think will happend?
Will the package execute only the failed task in the data flow task, or will it execute both tasks inside the data flow task?
As specified previously, CHECKPOINT in SSIS package works on the control flow level. We cannot configure it to execute the individual task at the data flow level.
Let’s execute the package and note the behaviors in the package.
It does not execute the Update Records task because this task was executed successfully in the previous run. SSIS gets this information from the CHECKPOINT file and skips this part
Data flow task is successful, but if we open the task details, we can see it executed both tasks. It does not skip task 1 that was successful in previous runs. It validates the above point that we cannot configure checkpoint at the data flow level. It works only at control flow task level
Open the progress bar, and in there you can see an information message that the package restarted from the checkpoint file. The package was configured to restart from the checkpoint
In this article, we explored the CHECKPOINT in the SSIS package and its usage to restart packages from the point of failure. It is an important feature especially for the large and involved packages that involve multiple steps, downloading files from locations such as SFTP. We can easily configure the package to use this feature. I would recommend to go through the steps and configure it for your environment.