
Learn to plan and manage data science projects using CRISP-DM, TDSP, Agile Data Science 2.0, with focus on business problem definition, data science problem formulation, situation assessment, and scheduling deliveries.
Explore the fundamentals of data science project planning, including why data science is needed, challenges, and core activities like business problem definition, data science problem formulation, situation assessment, and scheduling.
Setting the context of this course by briefly explaining what is data science and why do we need it.
Main challenges that lead to Data Science project failures
A brief introduction to the core activities of data science project planning and essential components of a project plan.
Define a well-defined business problem to guide a data science project and avoid failure, by learning the steps and reviewing examples of problem statements for the next section.
Importance of business problem definition. Consequences of poor business problem definition; Advantages of well defined problems. Tasks involved in business problem definition.
Identifying types of business problems ; examples of business problems
This lecture will describe how to identify project stakeholders and gather their inputs
This lecture will describe document the inputs obtained from stakeholder and assess a stakeholder's influence and interest in the project
Provides pointers to the sources of information about the previous work done to solve the business problem. Also explains why one should review the previous work.
This lecture is about creating a Business Problem Statement and getting the Stakeholder Buy-in for the same.
Map business problems to data science problems using CRISP-DM, select models for classification, regression, clustering, anomaly detection, association, and recommendation, and establish a data flow pipeline with clear metrics.
Introduction to various phases of CRISP-DM, a popular data science project life cycle based on scientific method for solving problems
A brief mention of steps involved in formulating a data science problem
Conceptual level understanding of Classification Problem
Conceptual level understanding of Regression Problem
Conceptual level understanding of Clustering Problem
Conceptual Understanding of Anomaly Detection Problem
Conceptual Understanding of Association Problem
Conceptual Understanding of Recommendation problem
All the six data science problem types discussed in the previous lecturesummarized in a tabular form with additional examples and also a single page visual recap.
A brief discussion on three areas which data science project goals should focus on viz; model development, establishing data flow pipeline and documentation
This overview lecture will briefly mention
a) the metrics used to evaluate model quality, their deployment and monitoring.
b) the metrics used for evaluating the efficiency and effectiveness for data flow pipeline
c) checkpoints to evaluate documentation quality
Definitions of Accuracy, Precision, Recall, F1 Score with examples
Definitions of Metrics - Precision, Recall, F1 Score; Accuracy Paradox;
Metrics discussed - Root Mean Square Error (RMSE), Coefficient of Determination (R-Squared)
Dunn Index & Silhouette Coefficient
Rand Index & Jaccard Index
Support, Confidence, Lift
Metrics to measure Prediction Error - Mean Absolute Error(MAE), Root Mean Square Error (RMSE),
Metric to measure Relevance -Mean Average Precision(MAP)
Metrics to measure Diversity, Coverage and Serendipity
Multivariate Testing; Effect Size; Statistical Power
Population Stability Index (PSI); Resource Consumption; Cost-Effectiveness
Availability; Latency; Throughput; Integrity; Scalability; Security; Privacy
Project Documents; Main Contents; Structure; Writing Style; Visuals
Assess the organization’s current situation and gather information on factors that may impact a data science project. Build awareness through the situation assessment to plan the project effectively.
Enumerates the factors that should be assessed while setting goals and planning a data science project
Skills needed, Team Roles & Team Attributes
Assessment of available Data, Knowledge and Computing resources
Different kinds of requirements the project may have to satisfy; assumptions underlying the project plan; constraints the project may have to operate.
General project risks; data related risks; risk assessment criteria; mitigation and contingency planning
Need for a glossary of terminologies
Factors to consider while conducting cost/benefit analysis
Learn how time drives data science project planning by outlining deliverables, activities, and team roles for each project phase, and discover what it takes to create an effective schedule.
Overview of key deliverables, activities and team roles for each phase in CRISP-DM lifecycle
Discusses the attributes of an effective schedule
Explore the Crisp-dm lifecycle, its limitations, and two emerging methods that enhance it, revealing the latest trends in the data science project life cycle.
A brief overview of how Emerging methods extend CRISP-DM through Agile approach
Overview of Key components of Team Data Science Process
Overview of Agile Data Science Manifesto and the main topics discussed in the book Agile Data Science 2.0 by Russell Jurney
Recap key points from the course, then compare how to review a data science project plan from business and data science perspectives, and highlight essential planning considerations.
A quick summary of main contents of this course.
List of questions from both business and data science problem perspective that needs to satisfactorily answered by the project plan before proceeding further
Important points to bear in mind while planning a data science project.
Plan your data science project in a systematic and effective way by applying what you learned. Raise questions and rate the course with your valuable feedback.
Success of any project depends highly on how well it has been planned. Data science projects are no exception.
Large number of data science projects in industrial settings fail to meet the expectations due to lack of proper planning at their inception stage.
This course will provide a overview of core planning activities that are critical to the success of any data science project.
We will discuss the concepts underlying - Business Problem Definition; Data Science Problem Definition; Situation Assessment; Scheduling Tasks and Deliveries.
The concepts learned will help the students in:
A) Framing the business problem
B) Getting buy-in from the stakeholders
C) Identifying appropriate data science solution that can solve the business problem
D) Defining success criteria and metrics to evaluate the key project deliverables viz; models, data flow pipeline and documentation.
E) Assessing the prevailing situation impacting the project. For e.g. availability of data and resources; risks; estimated costs and perceived benefits.
F) Preparing delivery schedules that enable early and continuously incremental valuable actionable insights to the customers
G) Understanding the desired team attributes and communication needs