ETL

Aveek Das
Executing the SSIS Package

An introduction to SSIS Data Lineage concepts

September 3, 2020 by

In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.

Read more »
Aveek Das
Transformation flow diagram

Understanding Data Lineage in ETL

September 3, 2020 by

In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.

Read more »
Timothy Smith

Security Testing with extreme data volume ranges

June 19, 2020 by

When we develop security testing within inconsistent data volume situations, we should consider our use of anti-malware applications that use behavioral analysis. Many of these applications are designed to catch and flag unusual behavior. This may help prevent attacks, but it may also cause ETL flows to be disrupted, potentially disrupting our customers or clients. While we may have a consistent flow of data throughout a time period – allowing for a normal window of behavior to occur – we may also have an inconsistent data schedule or inconsistent amount of data that cause these applications to flag files, directories, or the process itself.

Read more »
Timothy Smith
The load consumed 7 seconds for 2.45 million rows during the SQL bulk insert.

Lock Configurations with SQL Bulk Insert

May 11, 2020 by

One challenge we may face when using SQL bulk insert is whether we want to allow access during the operation or prevent access and how we coordinate this with possible following transactions. We’ll look at working with a few configurations of this tool and how we may apply them in OLAP, OLPT, and mixed environments where we may want to use the tool’s flexibility for our data import needs.

Read more »
Aveek Das

An overview of ETL and ELT architecture

April 21, 2020 by

This article explains what the basic features and differences between ETL and ELT are. I’m also going to explain in detail what an ELT pipeline is and a relevant architecture for the same in Azure. So far, we have come a long way dealing with ETL tools which basically are Extract, Transformation and Load technique used in populating a data warehouse. ELT, on the other hand, is another way to load data into a warehouse that implements the process of Extract, Load and Transform.

Read more »
Hadi Fadlallah
Class hierarchy for the data flow engine in EzApi

Biml alternatives: Building SSIS packages programmatically using EzAPI

March 26, 2020 by

In the previously published article, Biml alternatives: Building SSIS packages programmatically using ManagedDTS, we talked about building SSIS packages using the managed object model of the SSIS engine (ManagedDTS). In this article, we will illustrate another Biml alternative, which is the EzApi class library, and we will make a comparison between both technologies.

Read more »
Hadi Fadlallah
The control flow of the package created using ManagedDTS

Biml alternatives: Building SSIS packages programmatically using ManagedDTS

March 25, 2020 by

In the previously published articles in this series, we have explained how to use Biml to create and manage SQL Server Integration Services (SSIS) packages. In this article, we will talk about the first alternative of this markup language which is the Integration Services managed object model (ManagedDTS and related assemblies) provided by Microsoft.

In this article, we will first illustrate how to create, save and execute SSIS packages using ManagedDTS in C#, then we will do a small comparison with Biml.

Read more »
Hadi Fadlallah
Adding a new Biml script to the solution

Using Biml scripts to generate SSIS packages

March 13, 2020 by

In the previous article, Converting SSIS packages to Biml scripts, we explained how to convert existing SSIS packages into Biml scripts using Import Packages tool and we mentioned that this could be an efficient way to learn this markup language since it lets the user compare between the well-known SSIS objects found in the package and the correspondent elements in the generated scripts.

Read more »
Hadi Fadlallah
Create SSIS package control flow screenshot

Converting SSIS packages to Biml scripts

March 13, 2020 by

In our previous article, Getting started with Biml, we have explained what Biml is, what are the related tools and resources and how to get started with this technology. In this article, we will explain how to generate scripts from existing SSIS packages by showing all related options. Then, we will analyze the generated script and identify how each object is mentioned in the script.

Read more »
Sifiso W. Ndlovu
Error at OLE DB Destination component during data migration using SSIS's Data Flow Task

Dynamic column mapping in SSIS: SqlBulkCopy class vs Data Flow

February 14, 2020 by

The Data Flow Task is an essential component in SQL Server Integration Services (SSIS) as it provides SSIS ETL developers with an ability to conveniently extract data from various data sources; perform basic, fuzzy to advance data transformations; and migrate data into all kinds of data repository systems. Yet, with all its popularity and convenience, there are instances whereby the Data Flow Task is simply not good enough and recently, I got to experience such inefficiencies. To demonstrate some of the limitations of SSIS’s Data Flow Task, I have put together a random list of Premier League’s leading goal scorers for the 2019-2020 season.

Read more »
Hadi Fadlallah
the general tab page of the execute sql task

SQL OFFSET FETCH Feature: Loading Large Volumes of Data Using Limited Resources with SSIS

November 14, 2019 by

In this article, we illustrate how to use the OFFSET FETCH feature as a solution for loading large volumes of data from a relational database using a machine with limited memory and preventing an out of memory exception. We describe how to load data in batches to avoid placing a large amount of data into memory.

Read more »