Java Data Science Solutions - Analyzing Data
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Java Data Science Solutions - Analyzing Data to your Wishlist.

Add to Wishlist

Java Data Science Solutions - Analyzing Data

Solutions to help you overcome your data science hurdles using Java
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2 students enrolled
Created by Packt Publishing
Last updated 7/2017
Current price: $10 Original price: $125 Discount: 92% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 1.5 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Find out how to clean and make datasets ready so you can acquire actual insights by removing noise and outliers.
  • Develop the skills to use modern machine learning techniques to retrieve information and transform data to knowledge.
  • Familiarize yourself with cutting-edge techniques to store and search large volumes of data
  • Collect and analyze statistics from data using Apache Math API
  • Learn higher level concepts such as statistical significance test
View Curriculum
  • Familiarity with the fundamentals of data science.

If you are looking to build data science models that are good for production, Java has come to the rescue. This unique video provides modern solutions to solve your common and not-so-common data science-related problems. We start with solutions to help you obtain, clean, index and search data. Then you will learn a variety of techniques to analyze data. By the end of this course, you will be able to perform all advanced operations it takes to analyze the complexity of data and to perform indexing and search operations.

About The Author

Rushdi Shams has a PhD on application of machine learning in Natural Language Processing (NLP) problem areas from Western University, Canada. Before starting work as a machine learning and NLP specialist in industry, he was engaged in teaching undergrad and grad courses. He has been successfully maintaining his YouTube channel named Learn with Rushdi for learning computer technologies.

Who is the target audience?
  • This course is for Java developers who are familiar with the fundamentals of data science and want to improve their skills to become a pro.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
30 Lectures
Java Data Science Solutions - Analyzing Data
7 Lectures 16:06

This video will give an overview of the entire course.      

Preview 01:46

In this video, we take a look at how to retrieve the file paths and names from a complex directory structure that contains numerous directories and files inside a root directory.           

Retrieving All Filenames from Hierarchical Directories Using Java

Listing of file names in hierarchical directories can be done recursively as demonstrated in the previous video. However, this can be done in a much easier and convenient way and with less coding using the Apache Commons IO library.           

Retrieving All Filenames from Hierarchical Directories Using Apache Commons IO

There are different ways to read text files contents. This video demonstrates how to read text file contents all at once using Java 8.           

Reading Contents from Text Files All at Once Using Java 8

Another way to read the text files contents all at once is using Apache commons IO. Let’s see how the same functionality described in the previous video can be achieved using Apache Commons IO API.           

Reading Contentsfrom Text Files All at Once Using Apache Commons IO

The most difficult file types for extracting data are PDF. Some PDFs are not even possible to parse because they are password-protected. This video demonstrates how to extract text from PDF files using Apache Tika.           

Extracting PDF Text Using Apache Tika

ASCII text files can contain unnecessary units of characters that eventually are introduced during a conversion process. In this video, we clean several noises from ASCII text data using regular expressions.           

Cleaning ASCII Text Files Using Regular Expressions
Parsing and Extracting Data
7 Lectures 27:47

In this video, we will see parsing CSV files and handle data points retrieved from them.        

Preview 07:21

 There are several ways to parse contents of XML files. In this video, we are using JDOM for XML parsing.             

Parsing XML Files Using JDOM

 Just like XML, JSON is also a human-readable Data Interchange Format that is lightweight. In this video, we write JSON files.        

Writing JSON Files Using JSON.Simple

In this video, we will see how we can read or parse a JSON file. As our sample input file, we will be using the JSON file we created in the previous video.        

Reading JSON Files Using JSON.Simple

One of the easiest and handy ways is to use an external Java library named JSoup. In this video, we are using Jsoup for extracting web data.       

Extracting Web Data from a URL Using Jsoup

A large amount of data, nowadays, can be found on the Web. This data is sometimes structured, semi-structured, or even unstructured. This video uses a certain number of methods offered in JSoup to extract web data.       

Extracting Web Data from a Website Using Selenium Web Driver

 Data can be stored in database tables too. In this video, we will read data from the table in MySQL.       

Reading Table Data from a MySQL Database
Indexing and Searching Data
2 Lectures 14:13

Indexing is the first step for searching data fast. In action, Lucene uses an inverted full-text index. In this video, we will demonstrate how to index a large amount of data with Apache Lucene.

Preview 10:00

Now that we have indexed our data, we will be searching the data using Apache Lucene in this video.

Searching Indexed Data with Apache Lucene
Analyzing Data Statistically
7 Lectures 14:31

We can generate summary statistics for data by using the SummaryStatistics class. This is similar to the DescriptiveStatistics class used in the preceding video. The major difference is that unlike the DescriptiveStatistics class, the SummaryStatistics class does not store data in memory.

Generating Summary Statistics

In this video, we will be creating an AggregateSummaryStatistics instance to accumulate the overall statistics and SummaryStatistics for the sample data.

Generating Summary Statistics from Multiple Distributions

The Frequency class has methods to count the number of data instances in a bucket, to count unique number of data instances, and so on. Let’s explore how we compute frequency distribution.

Computing Frequency Distribution

This video is quite different than the other ones in this section as it deals with strings and counting word frequencies in a string. We will use both Apache Commons Math and Java 8 for this task.

Counting Word Frequency in a String

This video does not use the Apache Commons Math library to count frequencies of words in a given string; rather, it uses core libraries and mechanisms introduced in Java 8.

Counting Word Frequency in a String Using Java 8

Unbiased covariances are given by the formula cov(X, Y) = sum [(xi - E(X))(yi - E(Y))] / (n - 1) and Pearson's correlation computes correlations defined by the formula cor(X, Y) = sum[(xi - E(X))(yi - E(Y))] / [(n - 1)s(X)s(Y)], where E(X) is the mean of X and E(Y) is the mean of the Y values. Non-bias-corrected estimates use n in place of n - 1. Let’s see how we calculate them in our code.

Calculating Covariance and Pearson's Correlation of Two Sets of Data Points
Regression Analysis and Testing
7 Lectures 16:49

The SimpleRegression class supports ordinary least squares regression with one independent variable: y = intercept + slope * x, where the intercept is an optional parameter. In this video, the data points are added one at a time.

Preview 02:54

The OLSMultipleLinearRegression provides Ordinary Least Squares Regression to fit the linear model Y=X*b+u. Here, Y is an n-vector regress, and X is a [n,k] matrix, where k columns are called regressors, b is k-vector of regression parameters, and u is an n vector of error terms or residuals. Let’s see how we compute ordinary least squares regression in this video.

Computing Ordinary Least Squares Regression

In this video, we will see another variant of least squares regression named generalized least squares regression. GLSMultipleLinearRegression implements Generalized Least Squares to fit the linear model Y=X*b+u.

Computing Generalized Least Squares Regression

Apache Commons Math has support for both one-sample and two-sample t-tests. Besides, two sample tests can be either paired or unpaired. The unpaired two-sample tests can be conducted with and without the assumption that the subpopulation variances are equal. We demonstrate paired t-test in this video.

Conducting a Paired T Test

For conducting a Chi-square test on two sets of data distributions, one distribution will be called the observed distribution and the other distribution will be called the expected distribution.

Conducting a Chi-Square Test

ANOVA stands for Analysis of Variance. In this video, we will see how to use Java to perform a one-way ANOVA test to determine whether the means of three or more independent and unrelated sets of data points are significantly different.

Conducting the One-Way ANOVA Test

The Kolmogorov-Smirnov test (or simply KS test) is a test of equality for one-dimensional probability distributions that are continuous in nature. It is one of the popular methods to determine whether two sets of data points differ significantly.

Conducting a Kolmogorov-Smirnov Test
About the Instructor
Packt Publishing
3.9 Average rating
7,241 Reviews
51,754 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.