
Explore predictive analytics by forecasting futures from current data, such as covid-19 cases or vaccination rates. Assess the validity of predictions amid changing conditions and define appropriate time horizons.
Define the scope of application by clarifying the business problem, objectives, inputs, and constraints to build a predictive model that minimizes loan defaulters while balancing profits.
Explore big data versus non big data through the three v's—volume, velocity, and variety—and learn storage and compute decisions for structured, semi-structured, and unstructured data using SQL, NoSQL, and Hadoop.
Identify cross-sectional, time series, and panel data and know when date and time matter. Distinguish data structures—multiple columns versus a single time column—to determine appropriate techniques.
Identify primary and secondary data sources, distinguish their roles, map input variables to output and dependent variables, and convert unstructured data into structured formats for machine learning analysis.
Explore secondary data sources and how they differ from primary data sources. See telecom customer data, open data like Google Maps, drone analytics, and syndicated data to enrich insights.
Harness primary data sources by combining bank data with outward data from social media to improve loan default predictions, and distinguish primary data from secondary data in IoT contexts.
Master the probability basics using the dice example and the formula: favorable events over total events, with patterns like greater than three or smaller than four.
Explore how box plots use percentiles, quartiles, and quintiles to display results, with min and max and 25th, 50th, and 75th percentiles (Q1, Q2, Q3) and 100th percentile (fourth quartile).
Assess normality with graphical techniques, including Q-Q plots, histograms, and box plots, and understand standardized values and theoretical quantiles to determine if data are normally distributed.
Explore the third moment, skewness, and its formula using (x minus mean) cubed over sigma cubed to reveal non-normal, positively or negatively skewed data; learn through histogram examples.
Explore univariate visualizations, including bar plots and histograms, to interpret single-variable data through bins, frequency distributions, and differences between normal and non-normal patterns.
Explore univariate plots in Python with histograms, density plots, and box plots, and interpret skewness and distribution using pandas, numpy, seaborn, and MATLAB.
Load a dataset in python, read csv with pandas, and create a bivariate scatter plot of waist circumference versus adipose tissue, interpreting correlation and covariance for direction and strength.
Explore handling duplicates in data pre-processing with Python, consolidating multiple records into a single customer view, and removing duplicate rows or columns to improve data quality.
This program will help aspirants getting into the field of data science understand the concepts of project management methodology. This will be a structured approach in handling data science projects. Importance of understanding business problem alongside understanding the objectives, constraints and defining success criteria will be learnt. Success criteria will include Business, ML as well as Economic aspects. Learn about the first document which gets created on any project which is Project Charter. The various data types and the four measures of data will be explained alongside data collection mechanisms so that appropriate data is obtained for further analysis. Primary data collection techniques including surveys as well as experiments will be explained in detail. Exploratory Data Analysis or Descriptive Analytics will be explained with focus on all the ‘4’ moments of business moments as well as graphical representations, which also includes univariate, bivariate and multivariate plots. Box plots, Histograms, Scatter plots and Q-Q plots will be explained. Prime focus will be in understanding the data preprocessing techniques using Python. This will ensure that appropriate data is given as input for model building. Data preprocessing techniques including outlier analysis, imputation techniques, scaling techniques, etc., will be discussed using practical oriented datasets.