Python

Dejan Sarka

Data science, data understanding and preparation – entropy of a discrete variable

May 14, 2018 by

In the conclusion of my last article, Data science, data understanding and preparation – binning a continuous variable, I wrote something about preserving the information when you bin a continuous variable to bins with an equal number of cases. I am explaining this sentence in this article you are currently reading. I will show you how to calculate the information stored in a discrete variable by explaining the measure for the information, namely the entropy.

Read more »
Dejan Sarka

Data science, data understanding and preparation – binning a continuous variable

April 23, 2018 by

I started to explain the data preparation part of a data science project with discrete variables. As you should know by now, discrete variables can be categorical or ordinal. For ordinal, you have to define the order either through the values of the variable or inform about the order the R or the Python execution engine. Let me start this article with Python code that shows another way how to define the order of the Education variable from the dbo.vTargetMail view from the AdventureWorksDW2016 demo database.

Read more »
Dejan Sarka

Data science, data understanding and preparation – ordinal variables and dummies

March 29, 2018 by

In my previous article, Introduction to data science, data understanding and preparation, I showed how to make an overview of a distribution of a discrete variable. I analyzed the NumberCarsOwned variable from the dbo.vTargetMail view that you can find in the AdventureWorksDW2016 demo database. The graphs I created in R and Python and the histogram created with T-SQL were all very nice. Now let me try to create a histogram for another variable from that view, for the Education variable. I am starting with R, as you can see from the following code.

Read more »
Dejan Sarka

Introduction to data science, data understanding and preparation

March 14, 2018 by

Data science, machine learning, data mining, advanced analytics, or however you want to name it, is a hot topic these days. Many people would like to start some project in this area. However, very soon after the start you realize you have a huge problem: your data. Your data might come from your line of business applications, data warehouses, or even external sources. Typically, it is not prepared for applying advanced analytical algorithms on it straight out of the source. In addition, you have to understand your data thoroughly, otherwise you might feed the algorithms with inappropriate variables. Soon you learn the fact that is well known to seasoned data scientists: you spend around 70-80% of the time dedicated to a data science project on data preparation and understanding.

Read more »
Prashanth Jayaram

The importance of Python in SQL Server Administration

January 8, 2018 by

Some of my previous articles on Python provided insight of the basics and the usage of Python in SQL Server 2017.

This article is an effort to collect all the missing pieces and try to showcase the importance of using Python programming in SQL Server.

Read more »
Gerald Britton

Get more out of Python on SQL Server 2017

January 5, 2018 by

Introduction

One of the new features announced with SQL Server 2017 is support for the Python language. This is big! In SQL Server 2016, Microsoft announced support for the R language – an open source language ideally suited for statistical analysis and machine learning (ML). Recognizing that many data scientists use Python with ML libraries, the easy-to-learn-hard-to-forget language has now been added to the SQL Server ML suite.

There’s a big difference between R and Python though: R is a domain-specific language while Python is general purpose. That means that the full power of Python is available within SQL Server. This article leaves ML aside for the moment and explores a few of the other possibilities.

Read more »
Prashanth Jayaram

Data Interpolation and Transformation using Python in SQL Server 2017

November 21, 2017 by

As a continuation to my previous article, How to use Python in SQL Server 2017 to obtain advanced data analytics, a little bit of curiosity about Deep Learning with Python integration in SQL Server led me to write this latest article.

With Python running within SQL Server, you can bring the existing data and the code together. Data is accessible directly, so there’s no need to extract query data sets, moving data from storage to the application. It’s a useful approach, especially considering issues of data sovereignty and compliance, since the code runs within the SQL Server security boundaries, triggered by a single call from T-SQL stored procedures.

Read more »
Prashanth Jayaram

An overview of Python vs PowerShell for SQL Server Database Administration

November 2, 2017 by

Today, Microsoft claims that Linux runs like a First-Class citizen on Azure, .NET Core has been open-sourced, and has been ported over to Linux, taking PowerShell along. PowerShell runs really well on Ubuntu, CentOS, RedHat Linux, and even Mac OS X. There are Alpha builds available for a few other platforms as well, all available for exploitation under the MIT License on GitHub. “Manage anything, anywhere” is what Microsoft is offering to its customers. Keeping with that, we now have:

Read more »
Prashanth Jayaram

Why would a SQL Server DBA be interested in Python?

October 23, 2017 by

If we follow blogs and publications on the technological advancement with respect to SQL, we notice the increase in the number of references to Python, of late. Often, that makes us think:

  • Why so much emphasis on Python these days?
  • Isn’t knowing PowerShell scripting sufficient for the automation requirements of today?
  • Is it the time DBAs started learning a programming language such as Python in order to handle their day-to-day tasks more efficiently?
  • Why do so many job postings these days include “knowledge of scripting” as a requirement?
  • Is all of this happening because the paradigm is shifting? Can’t the current Microsoft-specific languages such as PowerShell handle the shift?
Read more »
Prashanth Jayaram

How to use Python in SQL Server 2017 to obtain advanced data analytics

June 20, 2017 by

On the 19th of April 2017, Microsoft held an online conference called Microsoft Data Amp to showcase how Microsoft’s latest innovations put data, analytics and artificial intelligence at the heart of business transformation. Microsoft has, over the last few years, made great strides in accelerating the pace of innovation to enable businesses to meet the demands of a dynamic marketplace and harness the incredible power of data—more securely and faster than ever before.

Read more »