
The tutor introduces himself, highlighting a data science and consulting background across global companies and a humorous, engaging approach to teaching CRISP-ML(Q) data understanding.
Explore the agenda and stages of analytics within a data science training program, and learn how a project management methodology governs real-world analytics from a high-level overview to details.
Apply diagnostic analytics to explain why something happened, such as an increase in COVID-19 cases. Tag factors like lockdowns and vaccination to account for the increase and drop in cases.
Predictive analytics forecasts future outcomes using current data, such as COVID-19 cases and vaccination rates. Assess horizon validity and adapt predictions as conditions change.
Explore prescriptive analytics through what-if scenarios that translate predictions into actions. Learn the four analytics stages—descriptive, diagnostic, predictive, and prescriptive—with real-world examples.
Explore the CRISP-ML(Q) framework and its six phases: business and data understanding, data preparation, model building and tuning, evaluation, deployment, and monitoring and maintenance, focused on ongoing data science projects.
Define scope of application and the business objective to minimize loan defaulters under constraints. Use inputs x and output y with survival analytics to predict default risk and balance profits.
Define the business success criteria by aligning KPIs such as loan default rate with machine learning and ROI, balancing accuracy and performance with practical constraints.
Explore business understanding and use cases, balancing fraud minimization with customer convenience. Learn drone-driven precision farming, multispectral sensing, and cost-aware optimization.
Explore the data understanding phase by identifying data types and scales of measurement, and define key terms while describing primary and secondary data collection techniques.
Learn how data understanding drives analysis, modeling, predictions, and optimization to support management decisions, with examples of sales data, leading to what-if analysis and strategic levers.
Explore the differences between continuous and discrete data, identifying decimal-representable values versus counts, with examples like time, money, height, and weight.
Contrast categorical data with count data, highlighting binary (boolean) and multiple categorical types, then apply to churn, defaults, claims, and other business examples.
Understand practical data concepts through real-time examples. Distinguish nominal data like flight numbers, ordinal data like gate numbers, interval data like temperatures, and ratio data like money with absolute zero.
Contrast quantitative and qualitative data by illustrating numeric versus descriptive measures, including continuous and count data alongside categorical data, and explain which data best informs decision making.
Explore the scales of measurement from nominal to ratio data, where nominal supports counts and frequencies, ordinal enables ranking, interval allows addition and subtraction, and ratio permits multiplication and division.
Explain structured data in tabular form versus unstructured data like videos, images, audio, and text; show transformation into structured data and discuss semi-structured formats such as HTML, XML, and JSON.
Compare big data and non-big data through the three Vs: volume, velocity, and variety, and choose appropriate storage and processing with SQL vs NoSQL and Hadoop.
Compare cross-sectional, time series, and panel longitudinal data, noting that cross-sectional ignores date and time while time series emphasizes them; panel data blends both properties with multiple columns and observations.
Learn how balanced, imbalanced, and rare events affect data understanding. Explore two-class and multi-class examples such as employee attrition and fraud, and normal vs non-normal distributions.
Compare batch offline processing with live streaming online data, illustrating loan default predictions and fraud detection through dashboards, rules, and automated versus manual decisions.
Master data collection concepts by distinguishing primary and secondary data sources, and learn dataset terms from input and output variables to structured and semi-structured data.
Understand secondary and primary data sources, and see how combining internal data, free Google Maps, and drone analytics yields insights for telecom 5G planning; primary data fills gaps.
Explore how primary data sources, including social media and IoT sensor data, augment loan default predictions and data quality, while addressing data privacy and secondary data considerations.
Learn to design end-to-end data collection with surveys, linking business reality to root-cause analysis, form research objectives, and craft multidimensional constructs and targeted questions.
Apply design of experiments to data collection, comparing discount expiry, distance radius, and timing to reveal how customers redeem coupons and respond to promotions.
Identify possible data collection errors, including random, systematic, and device-related biases from harsh environments. Apply gauge R&R and SOPs to ensure data quality and representativeness.
Learn to identify bias and ensure fairness by using representative data, avoid using race or gender in loan default models, and prevent biased outcomes in applications like facial recognition.
This course will help you understand the basics of Data Science and EDA using Python and we shall also dive deep into the Project Management Methodology, CRISP-ML(Q). Cross-Industry Standard Process for Machine Learning with Quality Assurance is abbreviated as CRISP-ML(Q). Data Science is omnipresent in every sector. The purpose of Data Science is to find trends and patterns with the data that is available through various techniques. Data Scientists are also responsible for drawing insights after analyzing data. Data Science is a multidisciplinary field that involves mathematics, statistics, computer science, Python, machine learning, etc. Data Scientists need to be adept in these topics. This course will provide you with an understanding of all the aforementioned topics.
A detailed explanation of the 6 stages of CRISP-ML(Q) will be provided. These 6 stages are as follows:
Business and Data Understanding
Data Preparation
Model Building
Evaluation
Model Deployment
Monitoring & Maintenance
The importance of Business objectives and constraints, Business success criteria, Economic success criteria, and Project charter will be thoroughly understood. Elaborate descriptions of various data types - continuous, discrete, qualitative, quantitative, structured, semi-structured, unstructured, big, and non-big data, cross-sectional, time series and panel data, balanced and unbalanced data, and finally, offline and live streaming data. Various aspects of data collection will be looked into. Primary, and secondary, data version control, description, requirements, and verification will be analyzed.
Data Preparation involving data cleansing, EDA using Python or descriptive statistics, and feature engineering will be elaborately explained. Data cleansing involves numerous methods like typecasting, handling duplicates, outlier treatment, zero & near zero variance, missing values, discretization, dummy variables, transformation, standardization, and string manipulation. The realm of EDA using Python will be explored, This would include understanding measures of central tendency (mean, median, and mode), measures of dispersion (variance, standard deviation, and range), skewness, and kurtosis which are also termed first, second, third and fourth-moment business decisions. More about bar plots, Q-Q plots, box plots, histograms, scatter plots, etc., will be looked into in EDA using Python. Feature engineering, the last part of data cleansing, will also be given enough coverage.
Further, the model building also known as data mining or machine learning will also be thoroughly talked about. Model building involves supervised learning, unsupervised learning, and, forecasting which will be explored. Several model-building techniques like Simple Linear regression, Multilinear regression, Logistic regression, Decision-Tree, Naive Bayes, etc.
The last few steps of CRISP-ML(Q) are Evaluation, Model Deployment, and Monitoring & Maintenance.
The learning journey will include CRISP-ML(Q) using Python & Data Science and EDA using Python. Having a thorough understanding of these topics will enable you to build a career in the field of data science.