
Let's thoroughly describe the journey that is ahead of you in the course! I will also provide you with some quick tips for efficient learning.
Get an overview of the concepts covered within the first section of the course.
Let's outset our journey by learning that the goal of Data Science is to turn data into valuable information.
We explore the essential approaches that Data Science has to achieve this goal.
We, as humans, need to make decisions every day. It can be of tremendous help if data can support us in decision making!
Are data flawless? Certainly not, so in this lecture we introduce learn about a handful of possible issues with the data.
Can we always and fully rely on what the data says? Not quite, as with some approaches, uncertainty comes into play and randomness is always present.
This learning story will walk us through a case where our data collection method can lead us to very wrong conclusions.
Believe it or not, but our human mind is fairly limited when it comes to thinking in many dimensions as well as working with larger amounts of data.
Due to the limitation of our mind we attempt to create Data Science models that simplify the phenomena around us through experimentation or observation.
Let's quickly recap on the key concepts from this chapter within a 5-minute-read article!
Honestly, if you now google for what is Data Science, you are met with sort of a mess. In this lecture, we explain why that is the case and introduce this section of the course.
Statistics is the original predecessor of data science. In this lecture, we examine what modern data science inherits from its predecessor.
The second discipline of data science is databases. We need our data to be stored safely, in a way that these are accessible and described.
Within the past decades, we see a rise in the amount of data in the world. We call this phenomenon Big Data. Unfortunately, this discipline of data science is often misunderstood.
The key concept to understand about Big Data is that data scientists are now expected to work with datasets that do not have a data science purpose. Let's explore it within this lecture.
With the rise of Big Data, data scientists had to partially step away from rigid statistical modeling and use less formal data mining methods.
Data Science looks for patterns in the data. This can however be a costly process and so practitioners often turn their hopes to machine learning.
Occasionally, we are working with very complex functions such as visual recognition, or natural language processing. In such cases, we need to resort to a subset of machine learning, called deep learning.
The key to understand about artificial intelligence is that these are larger systems that have a capability to act or perform some intelligent task independently.
Once we understood what data science is, and what disciplines contribute to it, let's turn our attention to who data scientists are!
The first essential skill of a data scientist is to have a data science mindset (and apply it of course). We will hence touch upon ethicality, being down-to-earth and being skeptical!
The key part of the vertical component of the T-shaped skillset, is the ability to work with rectangular data. Data scientist needs to preprocess, visualize and model them.We discuss it within this lecture!
Once a data scientist masters rectangular data, s/he might proceed to a specialized are around deep learning. This comes with some new tools and nuances.
The technical wing of the T-shaped skillset is about the skills related to infrastructure - such as data engineering and cloud engineering.
The soft wing of the T-shaped skillset regards the domain within which data science should create value. Data scientist should also be able to communicate and collaborate with others within the organization.
Let's recap on the most crucial concepts from this chapter through a brief article!
Get an overview of the concepts covered within the third section of the course.
Let's imagine that our friend asks us to describe our hobby to him. We will use this narrative to talk about the first step in working with data - describing it.
Describing data is an important step at the outset of every Data Science use case as it allows us to understand the essence of the dataset that we have at hand.
Let's get an overview of essential descriptive methods of Data Science. Within the first part, we outset with the measures of position and central tendency.
We continue to fill our toolkit of descriptive statistics by measures of spread. We build a concrete example of being a bakery that intends to bake breads with stable weight!
Finally, we arrive to data visualization. In this lecture, we talk about a single framework that allows us to organise our visualization efforts.
Every Data Science method has its pitfalls. In this lecture, we examine a pitfall even in such simple methods as measures of central tendency.
It is time to move forward from plain data description to data exploration. What is the difference between these two approaches?
Have you ever thought about purchasing or renting a property? It is really troublesome to arrive to a definite choice. This is because our mind is fairly limited, which of course has implications also when we explore the data.
Correlation is a powerful exploratory method of Data Science. Let's learn how its calculated and an essential intuition behind this method.
Even though correlation is a powerful concept, it isn't without pitfalls. We build on our example from previous lecture to observe how correlation can fail us.
Have you, as a child, heard the story that storks bring babies? Let's look into it and learn about an important pitfall of spurious correlation thanks to it.
In this lecture, we answer the wonder of why we should not be relying on a spurious correlation. We do so, through an example where a certain football match can supposedly predict the result of presidential election.
Let's recap on the key concepts from the third chapter before taking the quiz.
Get an overview of the concepts covered within the fourth section of the course.
We usually only have data about a sample from a population. In this lecture, we start the process of learning from a sample, and inferring or predicting something about an entire population. In the first part of this lecture, we discuss how we define a population.
In the second part of the lecture, we describe a sample and the most important aspect of it - representativeness of the population from which this sample was drawn.
In order to draw useful conclusions or models out of a sample, it shall be representative of the population which we have in mind. This is not always easy, as you will see in an example of building an app that should recognize poisonous mushrooms from edible ones.
It is finally time to apply an inferential method! Within the first part of the lecture, we set up a business problem that we will solve through inference.
Having the problem set up, we proceed to comparison of the samples and application of a t-test. This will allow us to conclude whether our sales strategy is having an impact.
Our attempt failed and we could not generalize the difference that we discovered between the samples. Never mind, in this lecture we will explore how we can continue with our experiment.
Data Scientists believe that the world around them works in functions. We are then hoping to estimate, or approximate these functions and construct a useful model this way.
We already know how to infer patterns learned on a sample to an entire population. When do we need these complex predictive models? We answer the question in this lecture.
Predictive model consists of several components. In this lecture, we go through them one by one to learn the essence of predictive model building.
There are various types of predictive models, such as supervised or unsupervised. In this lecture, we cover the basic distinctions.
Predictive model is never perfect, as the data, or the process of deriving patterns out of it will always contain some source of bias, or noise.
As you know already, every Data Science method has its pitfalls. Let's try to build a visual recognition model and see how it fails.
It is finally time to conclude the chapter by asking whether our model is really having an intended impact. We learn within this lecture, how we can reuse what we learned at the beginning of the chapter - inferential test.
Let's quickly recap on the essential concepts from the third section of the course before we take an assessment.
As there are questions and interest about the "deer use case" from our assignment within the last chapter of the course, I am happy to provide some further interesting information on how the data was collected for this use case!
A handful of recommendations from me about books which might be a worthy pickup after this course.
Let's revisit the T-shaped skillset of data scientist. I would like to provide you with a few personal tips on if you intend to grow into data science further.
Understanding how we can derive valuable information from the data has become an everyday expectation. Previously, organizations looked up to data scientists. Nowadays, organizations liberate data science. Everyone can contribute to the efforts of turning data into valuable information. Thus, even if your aspirations are not to be a data scientist, open yourself the door to these projects by gaining so-necessary intuitive understanding. With this course, you can take the first step into the world of data science! This course will explain how data science models create value from the absolute basics even if you feel like a complete beginner to the topic.
Three data scientists deliver the course, with cumulative 15 years of professional and academic experience. Hence, we won't repeat the textbooks. We will uncover a valuable bit of this lucrative field with every lecture and take you closer to your desired future role around data science projects. We do not teach programming aspects of the field. Instead, we entirely focus on data science's conceptual understanding. As practice shows, real-world projects tremendously benefit by incorporating practitioners with thorough, intuitive knowledge.
Over 6 hours of content, consisting of top-notch video lectures, state-of-the-art assignments, and intuitive learning stories from the real world. The narrative will be straightforward to consume. Instead of boring you with lengthy definitions, the course will enlighten you through dozens of relatable examples. We will put ourselves in the shoes of ice cream vendors, environmentalists examining deer migrations, researchers wondering whether storks bring babies, and much more! After the course, you will be aware of the basic principles, approaches, and methods that allow organizations to turn their datasets into valuable and actionable knowledge!
The course structure follows an intuitive learning path! Here is an outline of chapters and a showcase of questions that we will answer:
Chapter 1: "Defining data science". We start our journey by defining data science from multiple perspectives. Why are data so valuable? What is the goal of data science? In which ways can a data science model be biased?
Chapter 2: "Disciplines of Data Science". We continue by exploring individual disciplines that together create data science - such as statistics, big data, or machine learning. What is the difference between artificial intelligence and machine learning? Who is a data scientist, and what skills does s/he need? Why do data science use cases appear so complex?
Chapter 3: "Describing and exploring data". We tackle descriptive and exploratory data science approaches and discover how these can create valuable information. What is a correlation, and when is it spurious? What are outliers, and why can they bias our perceptions? Why should we always study measures of spread?
Section 4: "Inference and predictive models". Herein, we focus on inferential and predictive approaches. Is Machine Learning our only option when creating a predictive model? How can we verify whether a new sales campaign is successful using statistical inference?
Section 5: "Bonus section". We provide personal tips on growing into data science, recommended reading lists, and more!
We bring real-life examples through easy-to-consume narratives instead of boring definitions. These stories cover the most critical learnings in the course, and the story-like description will make it easier to remember and take away. Example:
"Do storks bring babies?" story will teach us a key difference among correlation, causation, and spurious correlation.
"Are we seeing a dog or a wolf?" story will explain why it is crucial to not blindly trust a Machine Learning model as it might learn unfortunate patterns.
"Is the mushroom edible?" case will show a project that might be a complete failure simply because of a biased dataset that we use.
"Which house is the right one?" story will explain why we frequently want to rely on Machine Learning if we want to discover some complex, multi-dimensional patterns in our data.
"I love the yellow walkman!" is a case from 20 years ago, when a large manufacturer was considering launching a new product. If they relied on what people say instead of what data say, they would have a distorted view of reality!
"Don't trust the HIPPO!" is a showcase of what is, unfortunately, happening in many organizations worldwide. People tend to trust the Highest Paid Person's Opinion instead of trusting what the data says.
The course is interactive! Here is what you will meet:
Assignments in which you can practice the learned concepts and apply your creative and critical thinking.
Quizzes on which you can demonstrate that you have gained the knowledge from the course.
You can take away many handouts and even print them for your future reference!
Shareable materials that you can use in your daily work to convey a vital Data Science message.
Reference and valuable links to valuable materials and powerful examples of Data Science in action.
Important reminder: This course does not teach the programming aspects of the field. Instead, it covers the conceptual and business learnings.