Practical Data Science

You will gain the necessary practical skills to jump start your career as a Data Scientist!
4.0 (18 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
667 students enrolled
$20
Take This Course
  • Lectures 41
  • Contents Video: 5.5 hours
    Other: 1 min
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 7/2015 English

Course Description

"Junior Level Data Scientist Median Salary from $91,000 and up to $250,000".

As an experienced Data Analyst I understand the job market and the expectations of employers. This data science course is specifically designed with those expectations and requirements in mind. As a result you will be exposed to the most popular data mining tools, and you will be able to leverage my knowledge to jump start (or further advance) your career in Data Science.

You do not need an advanced degree in mathematics to learn what I am about to teach you. Where books and other courses fail, this data science course excels; that is each section of code is broken down through the use of Jupyter and explained in a easy to digest manner. Furthermore, you will get exposed to real data and solve real problems which gives you valuable experience!

What are the requirements?

  • Python - IPython Notebook (Download/Installation instructions will be provided)
  • You should have Microsoft Excel

What am I going to get from this course?

  • Understand the entire Data Science Process
  • Use Python and its Scientific Libraries: Pandas, NumPy, StatsModels and more...
  • Put Theory and Concepts into action through Practical Application
  • Use various Statistical Methods to Extract useful Information from Data
  • Hands on Experience with handling Big Data

What is the target audience?

  • Junior Data Scientist
  • Statistical Analyst
  • Data Analyst
  • This course is suited for individuals who want to advance their career in data science or data analytics

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: What is Data Science?
05:23

This is introduction to the topic of Data Science. We discuss what is Data Science and some of the buzz words surrounding this subject.

13:55

We look at the most popular views on the Data Science Process to gain import insights behind this topic. The topics include the Knowledge Discovery Process (KDD), Industry Standard Data Mining Process (CRISP-DM) and much more.

Section 2: Python Basics
08:38

In this lecture we will install Anaconda, which is a completely free (and popular) Python distribution.

20:15

We look at the update version of iPython now known as Jupyter.

10:38

In this lecture, we cover the basics of a very popular scientific library in Python, called NumPy.

06:17

For the purpose of creating visuals, we look at matplotlib which is a 2D plotting library.

11:07

Pandas "aims to be the fundamental high-level building block for doing practical, real world data analysis in Python". This is one of the most important libraries for a data analysis to be familiar with when using Python. It leverages the power of NumPy and matplotlib among other things.

Section 3: Statistical Methods → Data Summarization
05:39

In this two part lecture on Data (or Variable) Types we look at identifying different types of variables.

05:40

In the second part, we learn numerical methods of summarizing individual variables, whether they are qualitative or quantitative.

09:16

Here we look at calculating descriptive statistics in Python.

05:54

We use Excel to generate Descriptive Statistics.

03:50

This can be thought of as a bonus lecture, where we use SAS to access Descriptive Statistics.

Section 4: Statistical Methods → Exploratory Data Analysis
09:08

Perhaps the most commonly used data visualization technique is a Histogram. This lecture answers: What is a Histogram and How to generate one in Python.

04:50

Probability Mass Functions are not routinely included in texts of Statistics, however, it can provide you with more information than a Histogram. We look at implementing Probability Mass Functions in Python.

18:21

The next logical concept in Exploratory Data Analysis (after Probability Mass Functions) is Cumulative Distribution Functions. We use smoothing to gain insights about the underlying distrubution of our emperical data.

05:03

In this lecture we look at Probability Density Functions and the difference between Empirical and Analytical distributions.

09:55

We look at the differences between Probability Density and Probability Distribution. Additionally, we look at how to generate a Kernel Density Plot in Python.

05:07

We move away from analysing individual variables and look at how variables affect each other. Specifically, we look at a very common technique Box Plot to examine relationship between two variables.

03:41

We continue with the Exploratory Data Analysis techniques to visualize two variables in concert. In this lecture, we look at Scatter Plots.

06:45

Here we look at methods that quantify relationship between two variables. Specifically the two common measures are known as: Correlation and Covariance.

06:07

Analyzing relationship between two Categorical Variables can prove to be very insightful. In this lecture we look at comparing different populations, testing the difference and visualizing the relationship.

Section 5: Exploratory Data Analysis (EDA) → Practical Example
16:46

We conduct exploratory data analysis on the Titanic passanger data set made popular by Kaggle.

Section 6: Statistical Methods → Statistical Analysis
06:53

Central Limit Theorem is a critical concept in statistics. The properties of this theorem allow us to make inferences about a population without knowing its true distribution. In this lecture we use simulations (in Python) to prove Central Limit Theorem (CLT) and use the CLT properties to evaluate central tendency and variance of a non-normal (population) distribution.

02:38

We expand on the previous lecture about Central Limit Theorem and introduce estimation, specifically looking at the probability of correctly estimating a parameter.

07:11

In this lecture we answer:

  • Why use vectors in data analysis?
  • How to Represent vectors in Python?
  • What's the difference between using List and NumPy array?
  • Vectorization vs. Loops, Why use loops?
  • What are Matrices in Python?
  • and more...


06:00

You do not need to rely on any external packages in order to generate summary statistics. In this lecture, we discuss how matrices can be used to calculate summary statistics of one, two or many variables.

02:33

We Introduce Parametric Models (for Statistics) and extend this idea to Linear Response Modelling. Before we can apply this to popular statistical techniques such as Linear Regression, we need to discuss the assumptions of Linear Response Models.

07:39

In this lecture we define linear regression, estimate model parameters and list regression assumptions.

06:09

In this lecture we estimate regression model parameters through Ordinary Least Squares using Matrices.

Section 7: Application of Statistical Methods
09:27

Multiple regression in Excel - we look at important regression statistics and how they can be calculated from the sample and our regression line. We also look at the implication of multiple t-tests and why f-test is more important in terms of the Regression model.

05:17

Linear Regression forms the basis of Statistical Analysis. We use the trusted Python library to find the Ordinary Least Squares (OLS) estimate in this practical example.

02:18

In this practical example we look to extend simple linear regression to multiple regression through the use of Statsmodels python library.

Section 8: Information Retrieval Using Query Language
01:28

Tools you need to complete the exercises for this section are discussed in this lecture. We also look at an important learning resource for SQL

04:00

We discuss the CREATE TABLE statement in SQL and create our demo table.

03:17

We look at SELECT statement and SELECT DISTINCT variation in SQL. We also look at the LIMIT Clause, which is equivalent to SELECT TOP Clause.

03:02

The ORDER BY keyword is used to sort the output in SQL, we discuss its usage in this video demonstration.

03:17

Grouping is commonly used to perform aggregation, and in this lecture we discuss the usage of GROUP BY in SQL.

Section 9: Big Data
18:32

Data Integration is performed at the early stage of a data science process. This video introduces you to HDF (Hierarchical Data Format) and you will learn how to easily implement this platform indepedent technology in Python.

19:44

We look at various methods available in Python that deal with large datasets which do not fit into memory. In addition we will look at combining chunks of these datasets to generate a Data Warehouse.

Article

This lecture contains the update Notebook of the example discussed in the previous video. Specifically, we utilize vectorization instead of for loops for the Table solution. This lecture is compleltely optional.

Section 10: Data Science for Business & Marketing
19:54

It is a common business objective to find which products or promotions increase sales. This lecture gives you an idea about how to utilize Exploratory Data Analysis as a means of Feature Selection and as well as Knowledge Discovery. We then use multiple regression to verify whether the effect really exists (based on what we learned in our Exploratory Data Analysis!).

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Atul Bhardwaj, Data Analyst

I have an educational background in statistics, data mining and data science. In addition to being a SAS 9 Base certified programmer, I have experience with real world data science projects and research (in the health care sector). Data Science is my passion, and I want to pass my knowledge onto like minded people. Please review my Linkedin Page to learn more about me

Ready to start learning?
Take This Course

What the Difference Between Data
Science and Statistics?

Not long ago, the term "data science" meant nothing to most people -- even the those who worked in data. A likely response to the term was: "Isn't that just statistics?".

These days, data science is hot. The job of "data scientist" was referred to by the Harvard Business Review as the "Sexiest Job of the 21st Century." Why did data science come to exist? And just what is it that distinguishes data science from statistics?

The very first line of the American Statistical Association's definition of statistics is "Statistics is the science of learning from data..." Given that the words "data" and "science" appear in the beginning fragment of this definition, one might assume that data science is just a rebranding of statistics. A number of Twitter humorists certainly have:

"A data scientist is a statistician who lives in San Francisco"

"Data Science is statistics on a Mac."

While there's a grain of truth in these jokes, the reality is more complicated. Data science, and its differentiation from statistics, has deep roots in the history of computers.

Statistics was primarily developed to help people deal with pre-computer data problems like testing the impact of fertilizer in agriculture, or figuring out the accuracy of an estimate from a small sample. Data science emphasizes the data problems of the 21st Century, like accessing information from large databases, writing code to manipulate data, and data visualization.

A Computer from the 1960s.

The arrival of the personal computer revolutionized access to data and what could be done with that data. It can be argued that data science is simply a response to this new technology.

The first well known appearance of the term data science is from legendary computer scientist Peter Naur's 1974 book Concise Survey of Computer Methods . In this book, Naur defines data science as "The science of dealing with data...." Right from the start, data science was not just about "analyzing" data (the bread and butter of classical statistics), but about "dealing" with it, using a computer. In Naurs's book, "dealing" with data includes all of the cleaning, processing, storing and manipulation of data that happens before the data is analyzed, and the subsequent analysis.

Though the term data science did not catch on from Naur's usage, in the 1980s and 90s an innovative community started to blossom of people who used computers to "deal with" data. Groups like the International Association for Statistical Computing and KDNuggets came up with new ways to use computers to find meaning in data.

This innovation was prompted by a few things: (1) The need to work with datasets larger than pre-computational statisticians could have conceived of. These datasets would later come to be known as big data. And, (2) an increased focus in industry on prediction -- of markets, of resources, of customer behavior, what have you -- for commercial uses. The inventors of data science borrowed from statistics, machine learning and database management to create a whole new set of tools for those working with data.

Statistics, on the other hand, has not changed significantly in response to new technology. The field continues to emphasize theory, and introductory statistics courses focus more on hypothesis testing than statistical computing.

Within the field of statistics, there were a few who believed that the discipline should transform itself to fit the changing landscape. In 2001, the influential statistician William Cleveland wrote a paper which suggested expanding the field of statistics and renaming it "data science." This new field would include a greater focus on real world "data analysis" and "computing." Cleveland's dream never came to pass, but many universities do now have data science departments -- in addition to their statistics departments.

Perhaps the most accurate Twitter quip about data scientists is the following:

"A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician."

***

Statistician and data visualizer Nathan Yau of Flowing Data suggests that data scientists typically have 3 major skills:

1. They have a strong knowledge of basic statistics and machine learning (at least enough to avoid misinterpreting correlation for causation, or extrapolating too much from a small sample size).

2. They have the computer science skills to take an unruly dataset and use a programming language (like R or Python), to make it easy to analyze.

3. They should be able to present that data and their analysis in a way that is meaningful to somebody less conversant in data, through visualization and summary.

Andrew Gelman, a statistician at Columbia University, writes that it is "fair to consider statistics... as a subset of data science" and probably the "least important" aspect. He suggests that the administrative aspects of dealing with data like harvesting, processing, storing and cleaning are more central to data science than hard core statistics.

The academic backgrounds of Udemy users who take data science and statistics courses demonstrates both the similarities and differences between the disciplines. The following table shows the ten most common academic backgrounds of Udemy users who took one of our statistics or data science courses.

Data Via Udemy

For courses in both statistics and data science, the most common backgrounds are Computer Science and Economics. The differences appear lower down the list. More of our data science students have a background in computationally heavy disciplines like electrical engineering, mathematics and accounting. In contrast, those taking statistics course are more likely to have focused on a less mathematical discipline, like graphic design, marketing or psychology.

***

It's not just hype, data science really is in the ascendancy. According to data from the job search website Indeed.com, there were barely any job postings for data scientists before 2011, but by 2015 the demand for data scientists had surpassed the demand for statisticians. The chart below displays the percentage of all jobs posted for data scientists and for statisticians over the last ten years.

Via Indeed.com

Data scientist jobs are on the rise while statistician positions are on the decline. It is likely that some of the positions that, in the past, would have been listed for statisticians are now listed for data scientists (some firms use the terms interchangeably). But it is not just a tradeoff. For data scientists and statisticians combined, there were more than twice as many jobs listed in early 2015 than there were in early 2012.

Data science jobs are not just more common that statistics jobs, they are also more lucrative. According to Glassdoor, the national average salary for a data scientist position is $118,709 compared to $75,069 for statisticians.

***

Arguments over the differences between data science and statistics can sometimes get contentious. When the term "data science" came to prominence around 2011 there was a backlash. At that time, one well-known statistician referred to the position of a data scientist as "just the hip new name for statistician that will probably sound stupid 5 years from now."

But data science and statistics both continue to exist and there is no indication that either will go away. Although there is a great deal of overlap between the disciplines, data science developed for a very good reason. For the most past, statisticians chose not take on the data problems of the computer age.