Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
DATA SCIENCE with MACHINE LEARNING and DATA ANALYTICS
Rating: 4.2 out of 5(69 ratings)
533 students

DATA SCIENCE with MACHINE LEARNING and DATA ANALYTICS

DATA SCIENCE with MACHINE LEARNING and DATA ANALYTICS using R Programming, PYTHON Programming, WEKA Tool Kit and SQL
Last updated 3/2019
English

What you'll learn

  • DATA SCIENCE with MACHINE LEARNING and DATA ANALYTICS using R, PYTHON, WEKA and SQL
  • This course is designed for any graduates as well as Software Professionals who are willing to learn data science in simple and easy steps using R programming, Python programming, WEKA tool kit and SQL.

Course content

1 section86 lectures72h 22m total length
  • Introduction to Data Science53:16

    INTRODUCTION TO DATA SCIENCE:

    • What is Data Science?

    • Who is Data Scientist?

    • Who can be Data Scientist?

    • Data Science Process

    • Modern Data Scientist

    • Data Science Workflow

    • Technologies used in Data Science


    What is DATA SCIENCE :

    • Data science is a "concept to statistics, data analysis, machine learning and their related methods" in order to "understand and analyze” with data.

    • Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining.

    • Data Science is also called as "The Sexiest Job of the 21st Century".


    DATA ANALYSIS:

    • Data analysis is the process of extracting information from data. It involves multiple stages including establishing a data set, preparing the data for processing, applying models, identifying key findings and creating reports.

    • The goal of data analysis is to find actionable insights that can inform decision making.

    • Data analysis can involve data mining, descriptive and predictive analysis, statistical analysis, business analytics and big data analytics.


    Who is Data Scientist:

    • Statistician + Software Engineer

    • A person who is better at statistics than  any software engineer or a person who is better at software engineering than any statistician is a data scientist.


    Who can be Data Scientist:

    Computing Skills + Mathematics, Probability & Statistical Knowledge + Domain Expertise can be a data scientist


    Data Science Process:

    Real World   ->  Raw data collected  ->  Data is processed  -> Clean Data set  ->  Exploratory Data Analysis  ->  Models & Algorithms  ->  Communicate visual report (Making Decisions) ->  Data Product  ->  Real World


    Modern Data Scientist:

    • Math & Statistics

    • Programming & Database

    • Domain Knowledge & Soft Skills

    • Communication & Visualization


    Data Science Workflow:

    • Problem definition

    • Data Collection & Preparing

    • Model Development

    • Model Deployment

    • Performance Improvement


    Technologies used in Data Science:

    • R

    • Python

    • Weka  etc.......

  • Introduction to Machine Learning44:38

    Machine Learning:

    • It is similar like Human Learning

    • Machine learning is the sub-field of computer science that, according to Arthur Samuel, gives "computers the ability to learn without being explicitly programmed."

    • Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term "machine learning" in 1959 while at IBM.

    • Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) with data, without being explicitly programmed.


    Traditional Programming vs Machine Learning:

    • In traditional programming, if we give inputs + programs to the computer, then computer gives the output.

    • In machine learning, if we give inputs + outputs to the computer, then computer gives the program (Predictive Model).


    Example 1:  Here "a" and "b" are inputs and "c" is output

    a b c

    1 2 3

    2 3 5

    3 4 7

    4 5 9

    9 10 ?

    What is the output of c ?


    Example 2: Here "x" is input and "y" is output

    x y

    1 10

    2 20

    3 30

    4 40

    5 ?

    500 ?

    y ~ x :     y = 10x


    Example 3: Here "x" is input and "y" is output

    x y

    1 14

    2 18

    3 22

    4 26

    5 ?

    500 ?

    here we can observe linear regression

    y ~ x :     y = mx+c    here m is slope and c is constant

                  y = 4x+10

  • Introduction to R Programming42:57

    Machine Learning Engineer:

    1. Convert the business data into statistical model

    2. Make the machine to develop (train) the model

    3. Evaluate the performance of the model

              Actual vs Predicted (% accuracy, % error)

    4. Techniques to improve the performance.

              (Classification, Regression, Clustering)


    Technologies used in Machine Learning:

    • R

    • Python

    • Weka

    • Amazon Machine Learning

    • Java   etc.....


    R:

    • R is a programming language

    • Free software

    • Statistical computing, graphical representation and reporting.

    • Designed by: Ross Ihaka, Robert Gentleman, Developed at University of Aukland

    • Derived from S and S-plus language (commercial product)

    • Typing discipline: Dynamic

    • Stable release: 3.5.1 ("Feather Spray") / July 2, 2018; 55 days ago

    • First appeared: August 1993; 25 years ago

    • License: GNU GPL

    • Functional based language

    • Interpreted programming language

    • Distributed by CRAN (Comprehensive R Archive Network)

    • Open source product (R-Community)

    • Functions are available as a package

    • Default packages are already  attached to the R-console eg base, utils, stats, graphics etc

    • Attach the package to the R-application

    • Install Add-on packages from CRAN Mirrors.


    Write a program to print HELLO WORLD in C language:

    #include<stdio.h>

    #include<conio.h>

    void main()

    {

    printf("HELLO WORLD");

    getch();

    }


    Write a program to print HELLO WORLD in Java:

    class Hello

    {

    public static void main(String args[])

    {

    System.out.println("HELLO WORLD");

    }

    }


    Write a program to print HELLO WORLD in R:

    print("HELLO WORLD")


    NOTE: R programming language is very simple to learn when compare to traditional programming languages (C, C++, C#, Java).

  • R Installation & Setting R Environment50:16

    How to Download & Install R:

    • Once goto official website of R i.e., www.r-project.org

      (or)

    • Search "R" in Google and click on first link (The R Project for Statistical  Computing).

    • Click on "Download R".

    • Click on any one of the CRAN Mirror. Eg: https//cloud.r-project.org

    • Click on Download R for Windows.

    • Click on Install R for the first time.

    • Finally click on Download R 3.5.1 for Windows (32/64 bit).


    Setting R Environment:

    • R come with a lot of packages.

    • By default only some packages will be attached to the R environment.

    1. search()

          displays the currently attached packages

    2. installed.packages()

          displays the installed packages in the machine

    3. library(package name) / require(package name)

         attaches the packages to the R application

    4. install.packages("package name")

          installs the add-on packages from CRAN

    5. detach(package:package name)

          detaches the packages from the R environment


    Package - Help

    • library(help="package name")


    Function - Help

    • help(function name)

      or

    • ?function name

  • Variables, Operators & Data types53:10

    Variables in R:

    A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.


    Variable Name Validity

    a_2.                 Valid

    .a                 Valid

    a.b                 Valid

    a%                 Invalid

    1a                 Invalid

    .1a                 Invalid

    _a                 Invalid



    Operators in R:

    • An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations.

    • We have the following types of operators in R Programming:

    1. Relational operator

      ==,<,>,<=,>=,!=

    2. Logical operators

      & (AND), | (OR), ! (NOT)

    3. Mathematical operators

      +,-,*,/,%% (Module), ^/** (Exp), %/% (Integer division)

    4. Assignment operators

                          Assign the values:

                           =

                           <-

                           ->

                           assign("var_name",value)


    Data/Object types in R:

    • R is called as a Dynamic typed language, which means that we can change a variable's data type of the same variable again and again when using it in a program.

    • Dynamic typed language (No Declaration)


    1. Logical - TRUE,FALSE,T,F

    2. Double  - 10,20.30,45,-45

    3. Integer - 10L,35.34L,-55L

    4. Character - "Data", "Hills", "7"

    5. Complex  - 3+6i,2+10i


    typeof(var_name/value):

    • Returns the internal storage data type.

      a <- 10

      typeof(a)

      double


    a <- "DataHills"

    typeof(a)

    character


    to test the data type:

    is.datatype(var_name/value)

    is.logical(TRUE)


    convert the data type:

    as.datatype(var_name/value)

    as.integer(a)

  • Structures47:07

    Comments in R:

    ==============

    --> Single comment is written using # in the beginning of the statement.

    # Comments are like helping text in your R Program

    --> Multi-line comments is written using if()

    if(FALSE) {

    "We put such comments inside, either

    single or double quote" }


    Variable Assignment:

    ===================

    1. print()

    2. cat()


    print():

    -------

    --> print() function is used to print the value stored in variable

    Ex:

    a <- 10

    print(a)


    cat():

    -----

    --> cat() function is used to combines multiples items into a continuous print output.

    Ex:

    a <- "DataHills"

    cat("Welcome to ", a)



    Datatype of a Variable:

    =======================

    1. typeof()

    2. class()

    3. mode()


    1. typeof(var_name/value)

    -------------------------

    --> typeof determines the (R internal) type or storage mode of any object

    Ex:

    typeof(a)

    typeof(10)


    2.class(var_name/value)

    -----------------------

    --> R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.

    --> Method dispatch takes place based on the class of the first argument to the generic function.

    Ex:

    class(a)

    class(10)


    3. mode(var_name/value)

    -----------------------

    --> Get or set the type or storage mode of an object.

    Ex:

    mode(a)

    mode(10)



    Displaying & Deleting Variables in R:

    =====================================

    1. ls()

    2. rm()


    1. ls():

    --------

    --> ls() function is used to display all the variables currently availabe in the R environment.

    Ex:

    ls()


    --> ls() function is also used to display patterns to match the variables names by using pattern.

    Ex:

    # Display the variables starting with the pattern "a"

    ls(pattern="a")


    --> ls() function is also used to display hidden variables i.e, the variable starting with dot(.) by using all.names=TRUE.

    Ex: Display the variables which are hidden

    ls(all.names=TRUE)


    --> rm() function is used to delete the variable.

    Ex:

    rm(a)


    --> rm() function is also used to delete all the variables by using rm() and ls() function together.

    Ex: Remove all the variables at a time

    rm(list=ls())



    Structures/Objects in R:

    ========================

    1. Vectors

    2. Lists

    3. Matrices

    4. Data Frames

    5. Arrays

    6. Factors

  • Vectors1:04:04

    Vectors:

    ========

    --> Single dimensional object with homogenous data types.

    --> To create a vector use fucntion c()

    --> Here "c" means combine

    # if i try like this

    a <- 10,20,30,40

    it gives an error.

    # then combine all these values by using c()

    a <- c(10,20,30,40)


    # to check the internal storage of a

    typeof(a)

    # to check the internal storage of each value in a

    lapply(a,FUN=typeof)

    sapply(a,FUN=typeof)

    or

    lapply(a,typeof) # list of values

    sapply(a,typeof) # vector of values


    --> Vectors are the most basic R structures/objects

    --> The types of atomic vectors are in

    1. logical

    2. integer

    3. double

    4. complex

    5. character


    Vector Creation:

    ================

    --> We can create vectors with single element and multiple elements.

    --> They are

    1. Single Element Vector

    2. Multiple Elements Vector


    Single Element Vector:

    ======================

    --> When we assign a single value into variable, it becomes a vector of length 1 and belongs to one of the above vector types.

    Ex:

    a <- 10

    b <- 20L

    c <- "DataHills"

    d <- TRUE

    e <- 2+3i


    Multiple Elements Vector:

    =========================

    --> When we assign multiple value into a variable, it becomes a vector of length n

    and belongs to one of the above vector types.

    Ex:

    a <- c(10,20,30,40,50)

    b <- c(20L,40L,60L,80L)

    c <- c("Srinivas","DataHills","DataScience","MachineLearning")

    d <- c(T,FALSE,TRUE,F,T,F)

    e <- c(2+3i,4+4i,5+6i)


    # Heterogeneous data type values are converted into homogeneous data type values:

    a <- c(10,20,30,40,"DataHills")

    Output:

    "10"  "20"  "30"  "40"  "DataHills"

    # The double and character values are converted into characters.


    Observer with some examples:-

    a <- c(10L,20)

    a <- c(T,5)

    a <- c(2+3i,"DataHills")

    a <- c(9L,30,4+5i)


    Here data types having some priority, based on that they are converting.

    i.e, Lower data types to higher data types

    1. CHARACTER

    2. COMPLEX

    3. DOUBLE

    4. INTEGER

    5. LOGICAL


    a <- c(TRUE,30,20L,2+3i,"DataHills")

    a <- c(TRUE,30,20L,2+3i)

    a <- c(TRUE,30,20L)

    a <- c(TRUE,20L)


    To generate a sequence of numeric values

    <Start_Value>:<End_Value>

    1:10

    10:1

    3.5:10.5

    10.5:3.5

    # by using seq() function

    Syntax: seq(from=VALUE,to=VALUE,by=VALUE)

    Ex:     seq(from=1,to=10,by=1)

    seq(to=10,by=1,from=1)

    seq(by=10,to=100,from=10)

    seq(1,10,by=2)

    seq(from=1,10,2)

    seq(1,to=10,2)

    seq(1,10,1)

    seq(2,20,2)

    seq(10,1,1) # Error

    seq(10,1,-1)

    seq(1,10,pi)

    seq(10)

    seq(-10)

    seq(1:10)

  • Vector Manipulation & SubSetting1:06:03
  • Constants41:38
  • RStudio Installation & Lists Part 11:02:20
  • Lists Part 247:44
  • List Manipulation, Sub-Setting & Merging45:01
  • List to Vector & Matrix Part 149:52
  • Matrix Part 244:02
  • Matrix Accessing48:26
  • Matrix Manipulation, rep function & Data Frame56:08
  • Data Frame Accessing54:01
  • Column Bind & Row Bind50:33
  • Merging Data Frames Part 150:04
  • Merging Data Frames Part 254:26
  • Melting & Casting52:55
  • Arrays43:50
  • Factors50:53
  • Functions & Control Flow Statements40:27
  • Strings & String Manipulation with Base Package53:22
  • String Manipulation with Stringi Package Part 158:33
  • String Manipulation with Stringi Package Part 2 & Date and Time Part 148:13
  • Date and Time Part 253:19
  • Data Extraction from CSV File42:02
  • Data Extraction from EXCEL File50:40
  • Data Extraction from CLIPBOARD, URL, XML & JSON Files50:04
  • Introduction to DBMS50:22
  • Structured Query Language, MySQL Installation & Normalization41:36
  • Data Definition Language Commands1:02:24
  • Data Manipulation Language Commands47:29
  • Sub Queries & Constraints16:07
  • Aggregate Functions, Clauses & Views7:21
  • Data Extraction from Databases Part 152:31
  • Data Extraction from Databases Part 2 & DPlyr Package Part 152:39
  • DPlyr Package Part 251:36
  • DPlyr Functions on Air Quality Data set57:01
  • Plylr Package for Data Analysis46:51
  • Tidyr Package with Functions50:48
  • Factor Analysis57:11
  • Prob.Table & CrossTable50:22
  • Statistical Observations Part 151:48
  • Statistical Observations Part 240:35
  • Statistical Analysis on Credit Data set1:00:29
  • Data Visualization, Pie Charts, 3D Pie Charts & Bar Charts59:20
  • Box Plots54:38
  • Histograms & Line Graphs45:26
  • Scatter Plots & Scatter plot Matrices1:03:47
  • Low Level Plotting56:01
  • Bar Plot & Density Plot46:31
  • Combining Plots35:37
  • Analysis with Scatter Plot, Box Plot, Histograms, Pie Charts & Basic Plot51:07
  • Mat Plot, ECDF & Box Plot with IRIS Data set1:02:55
  • Additional Box Plot Style Parameters1:01:41
  • Set.Seed Function & Preparing Data for Plotting1:09:42
  • Q Plot, Violin Plot, Statistical Methods & Correlation Analysis59:26
  • ChiSquared Test, T Test, ANOVA, ANCOVA, Time Series Analysis & Survival Analysis54:42
  • Data Exploration and Visualization51:00
  • Machine Learning, Types of ML with Algorithms1:04:53
  • How Machine Solve Real Time Problems43:33
  • K-Nearest Neighbor (KNN) Classification1:07:45
  • KNN Classification with Cancer Data set Part 11:03:15
  • KNN Classification with Cancer Data set Part 243:12
  • Navie Bayes Classification43:53
  • Navie Bayes Classification with SMS Spam Data set & Text Mining58:43
  • WordCloud & Document Term Matrix56:39
  • Train & Evaluate a Model using Navie Bayes1:11:40
  • MarkDown using Knitr Package1:02:15
  • Decision Trees57:16
  • Decision Trees with Credit Data set Part 147:03
  • Decision Trees with Credit Data set Part 245:11
  • Support Vector Machine, Neural Networks & Random Forest46:50
  • Regression & Linear Regression44:04
  • Multiple Regression48:24
  • Generalized Linear Regression, Non Linear Regression & Logistic Regression35:37
  • Clustering29:04
  • K-Means Clustering with SNS Data Analysis1:06:17
  • Association Rules (Market Basket Analysis)39:32
  • Market Basket Analysis using Association Rules with Groceries Data set56:19
  • Waikato Environment for Knowledge Analysis (WEKA)44:25
  • Analysis & Prediction using WEKA Machine Learning Toolkit44:43
  • Python Libraries for Data Science22:32

Requirements

  • Before proceeding with this course, you should have a basic knowledge of writing code in R programming and Python programming language, using any R IDE or python IDE and execution of R programs or Python programs. If you are completely new to DATA SCIENCE then this course gives a sound understanding of the analysis and prediction.
  • Basic mathematics knowledge (probability and statistics), basic SQL queries and basic programming knowledge is enough.

Description

DATA SCIENCE with MACHINE LEARNING and DATA ANALYTICS using R Programming, PYTHON Programming, WEKA Tool Kit and SQL.


This course is designed for any graduates as well as Software Professionals who are willing to learn data science in simple and easy steps using R programming, Python Programming, WEKA tool kit and SQL.


Data is the new Oil. This statement shows how every modern IT system is driven by capturing, storing and analysing data for various needs. Be it about making decision for business, forecasting weather, studying protein structures in biology or designing a marketing campaign. All of these scenarios involve a multidisciplinary approach of using mathematical models, statistics, graphs, databases and of course the business or scientific logic behind the data analysis. So we need a programming language which can cater to all these diverse needs of data science. R and Python shines bright as one such language as it has numerous libraries and built in features which makes it easy to tackle the needs of Data science.

In this course we will cover these the various techniques used in data science using the R programming, Python Programming, WEKA tool kit and SQL.

The most comprehensive Data Science course in the market, covering the complete Data Science life cycle concepts from Data Collection, Data Extraction, Data Cleansing, Data Exploration, Data Transformation, Feature Engineering, Data Integration, Data Mining, building Prediction models, Data Visualization and deploying the solution to the customer. Skills and tools ranging from Statistical Analysis, Text Mining, Regression Modelling, Hypothesis Testing, Predictive Analytics, Machine Learning, Deep Learning, Neural Networks, Natural Language Processing, Predictive Modelling, R Studio, programming languages like R programming, Python are covered extensively as part of this Data Science training.

Who this course is for:

  • All graduates are eligible to learn this course.