Minette Steynberg

10 things you need to know to become a Data Scientist

August 22, 2016 by


If you have been browsing job ads lately, you would have noticed a huge amount of positions available for Data Scientist. The demand seems to be much larger than the supply which means that there is a huge opportunity here. However, there appears to be a catch: Most of these positions requires some experience or knowledge in the field of Data Science. So if you want midway through your career, how can you skill up to become a Data Scientist?

Well today I will attempt to answer this question.

What is Data Science

Before we jump into how one can become a Data Scientist, let’s first have a quick look at what exactly Data Science is.

We are all aware of the so-called “explosion of data”. More and more data is gathered through the web, mobile apps, fitness devices and the like. This is collectively known as Big Data. But big data does not only refer to the volume of data, but also to high velocity and high variety data.

Data Science is the skills and techniques required to make sense of all this data. Which includes advanced analytics, data mining, machine learning, data visualization and statistics. It’s the ability to draw insights from raw amounts of data to solve real-world problems.

According to the Gartner Report “Critical Capabilities for Operational Database Management Systems” 2015 :

“By 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform.”

We can already see this in SQL Server 2016 which now includes:

  • R Services

    R services allow data scientists and analysts to run statistical programming queries directly on their database. It supports extremely fast computations using multiple cores, processors and threads.

  • PolyBase

    PolyBase acts as a gateway between SQL Server and Hadoop or Azure blob storage, so you can use Transact-SQL to query non-relational data in the same way you would query relational data on your database.

  • PowerBI

    PowerBI it is tightly integrated with SQL Server allowing for easy analysis and sharing of data insights and creating rich visualizations

  • Cortana Intelligence Suite on Azure

    The Cortana intelligence suite combines big data and advanced analytics, allowing you to get actionable intelligence from your data. You can create models with Azure Machine Learning, and analyze data in Azure Data Lake or SQL Data Warehouse using Azure Data Lake Analytics, or Azure stream analytics, to mention but a few of the powerful tools which can be used with Cortana.

  • Keeping this in mind, A SQL Server professional will already have access to the tools required to become a Data Scientist.

    Here is a look at what Azure Machine Learning Studio looks like. You can try it out for free by going to this link and clicking on the Start Studio button.

    A myriad of helpful resources is available here to help you get started, including an interactive tutorial.

    Figure 1: Microsoft Azure Machine Learning in Action

    What do I need to know to be a Data Scientist

    1. You need to understand data. Know how to explore it and how to use statistical and analytical techniques

    2. You need to be able to query and manipulate data sets into required formats using Transact-SQL

    3. You need to be able to present data in a meaningful way by using tools such as Excel or Power BI.

    4. You need to understand statistics, and its role in gaining insights from data.

    5. You need to know how to use a statistical programming language such as R or Python.

    6. You need to be able to perform data transformation, cleansing and some statistical analysis

    7. You must understand data science concepts such as machine learning, algorithms , conditional probability etc

    8. You must be able to create machine learning models, and how to evaluate them

    9. You must be able to use machine learning to generate predictions and solve problems

    10. You must learn how to use tools such as Microsoft Azure HDInsight , Scala, Spark etc

    I know this is quite daunting. But it is achievable with some hard work and dedication. And luckily there are now multiple resources available to help you on your quest to become a Data Scientist.

    So how do prove to a prospective employer that I am now a Data Scientist?

    Microsoft recognizes that there is an extreme shortage of data scientists and as such has embarked on a mission to facilitate the study of Data Science for those who want to embrace this new exciting career opportunity.

    As such they have launched the Microsoft Professional Degree in Data Science which will run for the first time on the 22nd of August 2016.

    These courses have been designed by employers and collaboration of top universities such as Columbia and Harvard and will be available at EdX.com

    The degree program which is available on edX.com consists out of 4 units:

    • The Fundamentals

      This is where you will learn the basics, such as querying data and visualizing it. There are 3 compulsory courses in this unit and 1 elective where you can choose between using Excel or PowerBI

    • Core Data Science

      In this unit you will learn how to use a statistical programming language. You can choose between Python or R

    • Applied Data Science

      In this unit you will learn more advanced techniques using Python or R to be able to extract meaningful insights from your data.

    • A Cortana Intelligence Competition

      Finally you get to prove your recently acquired skills by completing a real world project which will be scored and graded, and ultimately award you your degree in Data Science.


    Microsoft estimates that there are in the region of 1.5 million jobs available for Data Scientists. Looking at the skills required to become a Data Scientist can take the wind out of your sales. But luckily various universities and companies have recognized the shortage of skills and have started programs to bridge this gap.

    Microsoft themselves are offering a degree program which has been developed by experts and academics in the industry, which will open the doors for many who aspire to become data scientists.


    Minette Steynberg