Every business and organization that collects data is capable of tapping into its own data to gain insights on how to improve. Haskell is a purely functional and lazy programming language that is well suited to handling large data analysis problems. This video picks up where Beginning Haskell Data Analysis takes off. This video series will take you through the more difficult problems of data analysis in a conversational style.
You will be guided on how to find correlations in data, as well as multiple dependent variables. You will be given a theoretical overview of the types of regression and we’ll show you how to install the LAPACK and HMatrix libraries. By the end of the first part, you’ll be familiar with the application of N-grams and TF-IDF.
Once you’ve learned how to analyze data, the next step is organizing that data with the help of machine learning algorithms. You will be briefed on the mathematics and statistical theorems such as Baye’s law and its application, as well as eigenvalues and eigenvectors using HMatrix.
By the end of this course, you’ll have an understanding of data analysis, different ways to analyze data, and the various clustering algorithms available. You’ll also understand Haskell and will be ready to write code with it.
About the Author
James Church lives in Clarksville, Tennessee, United States, where he enjoys teaching, programming, and playing board games with his wife, Michelle. He is an assistant professor of computer science at Austin Peay State University. He has consulted for various companies and a chemical laboratory for the purpose of performing data analysis work. James is the author of Learning Haskell Data Analysis.
Data is frequently presented in raw data files called CSV files. CSV files are often cumbersome to use. We'll take a CSV file and convert it to SQLite3.
Data is often raw and sometimes does not meet our specifications.
Data is hard to understand unless we visualize it. We'll do that in this video.
It's hard to get a feel of the shape of data by just plotting it. For that, we need KDE.
Data often has gaps and we'd like to estimate what might a data point be if we had an observation. Linear regression attempts to solve this.
The data in which we are inspecting (the year and the population) seem to be related, but how can we be sure? We need to compute the correlation coefficients.
Linear regression and coefficients are great! Unfortunately, it is possible for regression coefficients to be misleading. Here, we'll study a dataset which is purposely misleading.
In video 1, we performed linear regression on our dataset. In video 2, we studied the results. The results appear good, but visually, it's still bad. We need a new solution. Let's try logarithmic regression!
Okay, so our dataset isn't linear and it isn't logarithmic. What is it? Let's try polynomial.
We have lots of data in a database, but our data needs to be in matrix format.
Now that we have our data pulled from the database, we need to perform the actual regression.
We performed the regression and we got some results. Are they any good? Let's find out.
We found in our previous video that our scores could not and should not be trusted. For an intellectual exercise, let's explore how we might improve this score.
Let's explore text analysis, shall we? First, we have to clean our datasets.
Our dataset is clean. How do we use this data?
We now are able to find the n-grams of a dataset. Let's do something cool!
We're changing passes here. Let's talk about TF-IDF. We're trying to figure out how important a word is to a document.
Good. You learned what we need to know about TF-IDF. Let's apply that.
One problem with clustering is that it's hard to get good clustering data. We can solve this problem by just generating our own data.
We have clusters in our dataset, but how far apart are they? We'll use the "centroid" solution.
We have data which needs clustering. Let's use k-means clustering.
We need to cluster our data. How can we do this in a different manner?
How do recommendation engines work? We have lots and lots of data, but really we only need a subset of data in order to make recommendations.
How do we prepare our dataset?
How do we perform eigendecomposition and what is it good for?
We still have our really big dataset. Let's make it smaller. That's easy.
Now that we've reduced our dataset, let's make some recommendations with it.
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.