Aveek Das

Aveek Das
Using the PGAdmin Management Tool

Setting up a PostgreSQL Database on Mac

April 23, 2021 by

In this article, I am going to discuss different ways in which you can install and setup Postgres Database on a Mac. Postgres is an open-source relational database system that can be used to develop a wide variety of data-based applications. Postgres has been popular for analytical workloads as well since it has support for column-store index and in-memory storage as well. Postgres is also available on all the major public cloud services like AWS, Azure, and GCP. In order to use those services, it is recommended that you should also have them installed on your local machine before deploying your databases to the cloud directly.

Read more »
CloudFormation Template on AWS Console

Spinning up MySQL instances on RDS using CloudFormation Templates

April 20, 2021 by

In this article, we are going to discuss how to set up a MySQL instance on AWS RDS using Cloud Formation templates. In my previous article, How to configure an Amazon RDS environment for MySQL, I have provided a detailed walkthrough of how to set up a MySQL instance on Amazon. You can use the AWS console to provide all the information required for setting up the instance and then use it. However, in this article, we will discuss an automated way of achieving the same functionalities using Cloud Formation templates.

Read more »
Overview of Apache Spark Architecture

Introduction to Apache Spark

April 12, 2021 by

In this article, I am going to discuss Apache Spark and how to create robust ETL pipelines for transforming big data. I will start from the very basics of Spark and then provide details on how to install Spark and start building the pipelines. In the later part of the article, I will also discuss how to leverage the Spark APIs to do transformations and obtain data into Spark data frames and SQL to continue with the data analysis.

Read more »
Creating a new table in AWS Athena

Getting started with Amazon Athena and S3

April 7, 2021 by

In this article, I am going to discuss Amazon Athena and how we can analyze data stored in S3 using Athena. As you might know, Amazon’s AWS has a lot of services in the field compute, databases, analytics, machine learning, and robotics, one of the most important and popular services is Amazon Athena. By the official definition, “Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.”

Read more »
Using the iterable unpacking operator in python - arguments in python

Understanding *args and *kwargs arguments in Python

April 2, 2021 by

In this article, I am going to talk in detail about the functions and arguments in Python. Python is one of the most popular and in-demand programming languages. Recently, a lot of programmers are gaining interest to work with python and as such, there is a huge community around it that is constantly evolving. Python is also considered to be one of the most flexible languages as it can be used to develop web-based applications, REST APIs as well as can also be used significantly in the scientific computation world to deal with data analysis and machine learning.

Read more »
Console output from the above snippet

Working with JSON data in Python

March 30, 2021 by

In this article, I am going to write about the various ways we can work with JSON data in Python. JSON stands for Java Script Object Notation and has become one of the most important data formats to store and transfer data across various systems. This is due to its easy-to-understand structure and also because it is very lightweight. You can easily write simple and nested data structures using JSON and it can be read by programs as well. In my opinion, JSON is much more human-readable as compared to XML, although both are used to store and transfer data. In modern web applications, by default JSON is being used to transfer information.

Read more »
Generating plots in R

Setting up a Machine Learning environment using R and RStudio

March 23, 2021 by

In this article, I am going to introduce a few concepts of how to set up and get started with R and RStudio to perform machine learning workloads. It has always been the heat of the discussion on whether to choose Python or R for performing Machine Learning analysis. In my opinion, both the languages excel in their own space and there is no point-to-point comparison between the two directly. Mathematicians and statisticians like to work within the R environment, while programmers choose to work with Python.

Read more »
Amazon Athena uses

An introduction to AWS Athena

March 19, 2021 by

In this article, I am going to introduce AWS Athena, a service offered by Amazon which allows users to query data from S3 using standard SQL syntax. AWS is considered to be a leader in the cloud computing world. Almost more than a hundred services are being offered by Amazon which offers competitive performance and cost-effective solutions to run workloads as compared to on-premise architectures. The services offered by Amazon range widely from compute, storage, databases, analytics, IoT, security, and a lot more. One of the popular areas of these services in the Analytics domain. This allows the customer to build architectures that answer key questions to their business decisions.

Read more »

Create REST APIs in Python using Flask

March 12, 2021 by

In this article, I am going to explain what a REST API is all about and how to get started with creating APIs in Python using Flask. In the recent software world, REST APIs play a major role as a communication channel between different services. It has become the de facto standard of passing information across multiple systems in the JSON format. This is because it has a uniform interface to share messages across two different systems. Let us learn more about REST APIs in this article.

Read more »
EC2 Instance Types

Overview of EC2 Instance Types in AWS

March 8, 2021 by

In this article, I am going to talk about the various EC2 instance types available in AWS. EC2, abbreviated as Elastic Compute Cloud is an IaaS offering from AWS using which customers can provision virtual machines on the cloud using different combinations of CPU, RAM, disk and networking. There are many predefined instance types already available in the AWS console, which makes it extremely easy to spin up a new EC2 instance very easily.

Read more »
AWS Certifications - AWS CCP

Preparing for the AWS Certified Cloud Practitioner (CCP) exam

March 1, 2021 by

In this article, I am going to discuss the AWS Certified Cloud Practitioner exam. Cloud Computing is one of the most fast-moving technologies in today’s world. With the rising demand for cloud computing platforms, more and more companies have already started using the cloud or are in the process of moving their infrastructure to the cloud. When the question of cloud vendors comes in, AWS is mostly preferred by major companies, also Azure is on the second list after AWS. With this demand, companies also continuously look for talented individuals who can help them lift and shift their infrastructure all already advise them with their existing cloud infrastructure.

Read more »
Specifying parameters while exporting data to an SQL table - Pandas

Exporting data with Pandas in Python

February 24, 2021 by

In this article, I am going to discuss the various ways in which we can use Pandas in python to export data to a database table or a file. In my previous article Getting started with Pandas in Python, I have explained in detail how to get started with analyzing data in python. Pandas is one of the most popular libraries used for the purpose of data analysis. It is very easy and intuitive to use. Personally, I love using the library due to the ease of use and the great documentation that is available online.

Read more »
Stored Procedure for moving data

Advanced usages of Data-Tier applications

February 12, 2021 by

In this article, I am going to explain some of the advanced usages of data-tier applications in Visual Studio. In my previous article, Working with Database Projects, I have explained how you can start building your database applications for SQL Server and Azure SQL Database using Visual Studio. This article will specifically focus on using SQLCMD variables and Publish Profiles of the Data-Tier Application development. For a better understanding, I would recommend reading the previous article and it will help to clear the basic concepts.

Read more »
PEP Workflow - Programming in Python

Best practices to follow while programming in Python

February 9, 2021 by

In this article, I am going to discuss some of the best practices that a programmer must follow while programming in python. Python as a language has evolved to a great extent over the last few decades and has gained popularity amongst a lot of software programmers, data enthusiasts, and system administrators. This is because of the ease of writing code in python and the large community behind it.

Read more »
Cloud Market Google Trend

Understanding AWS Billing services and concepts

February 3, 2021 by

In this article, I am going to explain AWS Billing services and the underlying concepts that one should be aware of while working with AWS. As you know, more and more companies are taking the essential step to migrate their existing applications to the cloud, it has become important for engineers to keep up the pace and learn the technologies of the cloud. In today’s market, AWS and Azure are two of the major cloud providers which are being used mostly. Also, Google Cloud Platform (GCP) is becoming popular, however, the demand for AWS is the highest.

Read more »
AWS Well-Architected Framework

An overview of AWS Well-Architected Principles

January 27, 2021 by

In this article, I am going to explain about the AWS Well-Architected Framework that helps AWS customers to design solutions following best practices while designing the architectures of their solutions. It enables the users to design secure, reliable and high performant cloud applications and workloads. This is more of a theoretical concept that is often advised to be followed while thinking of the architecture of any system. There are five pillars of the AWS Well-Architected Framework that enables customers to evaluate their existing architectures and implement scalable solutions. In this article, we will learn more about those five pillars and the best practices around them. The discussion below is a summarized form of the official whitepaper: AWS Well-Architected Framework.

Read more »
Create table using Design Pane

Working with Database Projects

January 22, 2021 by

In this article, I am going to talk about developing and deploying a database project, also known as a data-tier application using Visual Studio. In my previous article Getting started with Data-Tier Applications using Visual Studio, I have provided an overview of the data tier applications and how can we create one using Visual Studio. This article is a follow-up to the previous article. I’d advise you to have a look at it before proceeding forward with this as this is a continuation of the previous. For the article, I would be using Visual Studio 2019, however, you are free to use any other versions of Visual Studio.

Read more »
Selecting Target Platform

Getting started with Data-Tier applications in Visual Studio

January 15, 2021 by

In this article, I am going to talk about creating a data-tier application using Visual Studio. In my previous article An introduction to Data-Tier applications in SQL Server, I have explained in detail what a data-tier application is all about. I have explained what the different types of data-tier applications are available and how can we create such applications from existing SQL Server databases. In this article, the primary focus would be to create data-tier applications from scratch using Visual Studio. For this article, I am going to use Visual Studio 2019, however, the technique will remain similar for other editions of SQL Server as well.

Read more »
AWS IAM Service in AWS Management Console

An overview of AWS IAM

January 13, 2021 by

In this article, I am going to introduce the concept of AWS IAM, also known as Identity and Access Management in AWS. In any cloud service, controlling who has access to the services and how each of the services accesses the other services is an important task. If we do not control the access or restrict then there might be cases of a security breach within the services and we might not be able to track those as well. So as a best practice to restrict or control access within the AWS, there is a special service called IAM that can be used to manage and control almost everything in AWS. It is the permission control system that controls access to the various AWS resources and services.

Read more »
Creating job step for OLAP Cube

Advanced Usages of SQL Server Agent

January 8, 2021 by

In this article, I am going to introduce some advanced usages of the SQL Server Agent service in Microsoft SQL Server. In my previous article, Introduction to SQL Server Agent, I have discussed in detail how to use the service and the various components related to the service. To recap briefly, the SQL Server Agent is a job scheduler service within SQL Server and allows us to schedule T-SQL scripts, SSIS jobs, automate database backups and other tasks etc. In the last article, I have shown how to schedule a simple T-SQL script using the SQL Server Agent. This article will focus more on advanced concepts like scheduling a package in SSIS and processing an OLAP cube.

Read more »
Creating the Azure Function App

Logging messages from Azure Functions to Azure SQL Database

December 25, 2020 by

In this article, I am going to explain how to create a serverless application using Azure Functions and use Azure SQL Database to log messages generated by the function. In this world of cloud-based applications, it is very important that you are aware of how to create and design serverless applications. An important aspect while designing any application is to generate log messages at every key step or operation that is being performed. This helps us to understand the workflow whenever there are some issues and need debugging at some later point in time.

Read more »
Starting the SQL Server Agent Service

Introduction to the SQL Server Agent

December 15, 2020 by

In this article, I am going to explain in detail about the SQL Server Agent service. This is a windows service that enables database developers and database administrators to schedule jobs on the SQL Server machine. The jobs can be simple T-SQL scripts, stored procedures, SSIS packages or SSAS databases. This service is available on all the editions of SQL Server except the Express edition.

Read more »
Capturing updates and Operation - change tracking in sql server

Understanding Change Tracking in SQL Server using Triggers

December 9, 2020 by

In this article, I am going to explain what change tracking is in SQL Server and why do we need it. I will also illustrate the same using some practical examples using triggers in SQL Server. Change tracking as the name suggests, is a mechanism that helps us to identify the changes in the database as the application grows. In other words, it enables us to have a history of the changes that have been made to one or more tables in the database. The changes can be considered as either INSERTs, UPDATEs, or DELETEs.

Read more »
Documenting SSIS Packages using Sequence Diagrams

Documenting SSIS Packages using Sequence Diagrams

November 25, 2020 by

In this article, I am going to explain in detail how to document SSIS packages using Sequence Diagrams and the importance of these diagrams in the field of software engineering, no matter which programming language are you using. In my previous article, I have talked about the various UML Diagrams that are being used to document various software engineering processes. Also, I have talked about modular ETL architecture and how to create such a modular package in SSIS. Sequence diagrams are also a part of the broader UML Diagrams which define the interaction between the various components in the system in a chronological manner.

Read more »
Executing the master package - ETL in SSIS

Implementing a Modular ETL in SSIS

November 24, 2020 by

In this article, I am going to demonstrate about implementing the Modular ETL in SSIS practically. In my previous article on Designing a Modular ETL Architecture, I have explained in theory what a modular ETL solution is and how to design one. We have also understood the concepts behind a modular ETL solution and the benefits of it in the world of data warehousing. We have also related the concept of microservices architecture in software development to that of the modular ETL solution.

Read more »