Java Data Science Solutions - Big Data and Visualization
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
17 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Java Data Science Solutions - Big Data and Visualization to your Wishlist.

Add to Wishlist

Java Data Science Solutions - Big Data and Visualization

Explore the power of MLlib, DL4j, Weka, and more
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
17 students enrolled
Created by Packt Publishing
Last updated 8/2017
Current price: $12 Original price: $125 Discount: 90% off
3 days left at this price!
30-Day Money-Back Guarantee
  • 2.5 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion

Training 5 or more people?

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?
  • Use machine learning techniques to learn patterns from data
  • Perform clustering, and feature selection exercises using the Weka machine learning Workbench
  • Learn data import and export, classification, and feature selection using Java Machine Learning (Java-ML) library
  • Learn application of core Java and popular libraries, such as OpenNLP, Stanford CoreNLP, Mallet, and Weka
  • Learn application of big data platforms for machine learning, such as Apache Mahout and Spark-MLib
  • Familiarize yourself with the very basics of deep learning using the Deep Learning for Java (DL4j) library
  • Learn to use GRAL package to generate an appealing and informative display based on data
View Curriculum
  • Should be familiar with the fundamentals of data science.

If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to. This course will help you to learn how you can retrieve data from data sources with different level of complexities. You will learn how you could handle big data to extract meaningful insights from data. Later we will dive to visualizing data to uncover trends and hidden relationships. Finally, we will work through unique videos that solve your problems while taking data science to production, writing distributed data science applications, and much more—things that will come in handy at work.

About the Author

Rushdi Shams has a Ph.D. on Application of machine learning in Natural Language Processing (NLP) problem areas from Western University, Canada. Before starting work as a machine learning and NLP specialist in the industry, he was engaged in teaching undergrad and grad courses. He has been successfully maintaining his YouTube channel named "Learn with Rushdi" for learning computer technologies.

Who is the target audience?
  • This course is for Java developers who are familiar with the fundamentals of data science and want to improve their skills to become a pro.
Compare to Other Data Visualization Courses
Curriculum For This Course
39 Lectures
Data Operations
7 Lectures 24:22

This video will give an overview of the entire course.

Preview 02:31

Weka's native file format is called Attribute-Relation File Format (ARFF). Let’s learn in detail about this file and how to work with it.

Creating and Saving an ARFF File

In this video, we will create four methods–one method will load an ARFF file, the second method will read the data in the ARFF file and generate a machine learning model, the third method will save the model using serialization, and the last method will evaluate the model on the ARFF file.

Cross-Validating a Machine Learning Model

The classic supervised machine-learning classification task is to train a classifier on the labeled training instances and to apply the classifier on unseen test instances.

Classifying Unseen Test Data

Many times, you will need to use a filter before you develop a classifier. The filter can be used to remove, transform, discretize, and add attributes; remove misclassified instances, randomize, or normalize instances; and so on. Let’s see how we can use a filter and classifier at the same time to classify unseen test examples.

Classifying Unseen Test data with a Filtered Classifier

Most of the linear regression modeling follows a general pattern–there will be many independent variables that will collectively produce a result, which is a dependent variable. Let’s use Weka's linear regression classifier.

Generating Linear Regression Models

Weka has a class named logistic, which can be used to build and use a multinomial logistic regression model with a ridge estimator. We will explore how to use Weka to generate a logistic regression model.

Generating Logistic Regression Models
Clustering and Feature Selection
4 Lectures 11:33

In this video, we will use the K-means algorithm to cluster or group data points of a dataset together.

Preview 02:18

If you have a dataset with classes, which is an unusual case for unsupervised learning, Weka has a method called clustering from classes. We will cover this method in this video.

Clustering Data from Classes

Association rule learning is a machine learning technique to discover associations and rules between various features or variables in a dataset. Let’s see how we can use Weka to learn association rules from datasets.

Learning Association Rules from Data

In Weka, there are three ways of selecting attributes. This video will use all of the three ways of attribute selection techniques available in Weka: the low-level attribute selection method, attribute selection using a filter, and attribute selection using a  meta-classifier.

Selecting Features and Attributes
Learning from Data
4 Lectures 26:40

Java Machine Learning (Java-ML) library is a collection of standard machine learning algorithms. Unlike Weka, the library does not have any GUI because it’s primarily aimed for software developers.

Preview 11:54

The Stanford classifier is a machine learning classifier developed in the University of Stanford by the Stanford Natural Language Processing group. The software is implemented in Java and uses maximum entropy as its classifier.

Classifying Data Points Using the Stanford Classifier

Massive Online Analysis or MOA is related to Weka, but it comes with more scalability. It is a notable Java workbench for data stream mining. With a strong community in place, MOA has implementations of classification, clustering, regression, concept drift identification, and recommender systems.

Classifying Data Points Using Massive Online Analysis (MOA)

So far, we have seen multiclass classifications that aim to classify a data instance into one of the several classes. Multilabeled data instances are data instances that can have multiple classes or labels. The machine learning tools that we have used so far are not capable of handling data points that have this characteristic of having multiple target classes.

Classifying Multilabeled Data Points Using Mulan
Retrieving Information from Text Data
8 Lectures 33:11

One of the most common tasks that a data scientist needs to do using text data is to detect tokens from it. This task is called tokenization.

Preview 04:13

Sentences are a very important text unit for data scientists to experiment different routing exercises, such as classification. In this video, we will see how we can detect sentences so that we can use them for further analysis.

Detecting Sentences Using Java

The preceding two videos in this section detected tokens and sentences using legacy Java classes and methods in them. In this video, we will combine the two tasks of detecting tokens and sentences with an open source library of Apache named OpenNLP.

Detecting Tokens (words) and Sentences Using OpenNLP

Now that we know how to extract tokens or words from a given text, we will see how we can get different types of information from the tokens, such as their lemmas and part of speech, and whether the token is a named entity.

Retrieving Lemma and Part of Speech, and Recognizing Named Entities from Tokens

Data scientists often measure the distance or similarity between two data points for classification, clustering, detecting outliers, and for many other cases. When they deal with texts as data points, the traditional distance or similarity measurements cannot be used.

Measuring Text Similarity with Cosine Similarity Measure Using Java 8

With an ever-increasing amount of documents in text format nowadays, an important task for any data scientist is to get an overview of a large number of articles with abstracts, summaries, or a list of abstract topics, not because this saves time to read through the articles but to do clustering, classification, semantic relatedness measurement, and sentiment analysis.

Extracting Topics from Text Documents Using Mallet

Our final two videos in this section will be on the classical machine learning classification problem, that is, the classification of documents using language modeling. In this video, we will use Mallet and its command-line interface to train a model and apply the model on an unseen test data.

Classifying Text Documents Using Mallet

We used Weka to classify data points that are not in a text format. Weka is a very useful tool to classify text documents using machine learning models as well. This video will demonstrate how you can use Weka 3 to develop a document classification model.

Classifying Text Documents Using Weka
Handling Big Data
6 Lectures 19:05

In this video, we will use Apache Mahout to train an online logistic regression model using the Apache Mahout Java library.

Preview 04:28

This video will demonstrate how we can apply an online logistic regression model on an unseen, unlabeled test data using Apache Mahout.

Applying an Online Logistic Regression Model Using Apache Mahout

In this video, we will demonstrate how to use Apache Spark to solve very simple data problems. Of course, the data problems are merely dummy problems and not real-world problems, but this can be a starting point for you to understand intuitively the use of Apache Spark on a large scale.

Solving Simple Text Mining Problems with Apache Spark

MLib is the machine learning component of Apache Spark and is a competitive (even better) alternative to Apache Mahout. This video will demonstrate how we can cluster data points without labels using the K-means algorithm with MLib.

Clustering Using K-means Algorithm with MLib

In this video, we will explore how to use a linear regression model to model with MLib.

Creating a Linear Regression Model with MLib

In this video, we will demonstrate how you can classify data points using the random forest algorithm with MLib.

Classifying Data Points with Random Forest Model Using MLib
Learn Deeply from Data
3 Lectures 12:58

Word2vec can be seen as a two-layer neural net that works with natural text. In this video, we will use the deep learning Java library named deep learning for Java to apply Word2vec to a raw text.

Preview 05:33

A deep-belief network can be defined as a stack of restricted Boltzmann machines, where each RBM layer communicates with both the previous and subsequent layers. In this video, we will see how to create such a network.

Creating a Deep Belief Neutral Net

A deep autoencoder is a deep neural network that is composed of two deep-belief networks that are symmetrical. In this video, we will develop a deep autoencoder consisting of one input layer, four decoding layers, four encoding layers, and one output layer.

Creating a Deep Autoencoder
Visualizing Data
7 Lectures 21:57

Sine graphs can be particularly useful for data scientists since they are a trigonometric graph that can be used to model fluctuations in data. In this video, we will use a free Java graph library named GRAphing library GRAL to plot a 2D sine graph.

Preview 03:39

Histograms are a very popular method of discovering the frequency distribution of a set of continuous data. Let’s take a look at how to plot histograms using GRAL.

Plotting Histograms

Bar plots are the most common graph types used by data scientists. It is simple to draw a bar chart using GRAL. In this video, we will use GRAL to plot it.

Plotting a Bar Chart

Box plots are another effective visualization tool for data scientists. They give important descriptive statistics of a data distribution. In this video, we will explore drawing box plots for data distributions.

Plotting Box Plots or Whisker Diagrams

Scatter plots use both the x and y axes to plot data points and are a good means to demonstrate the correlation between variables. This video will demonstrate how to use GRAL to draw scatter plots for 100,000 random data points.

Plotting Scatter Plots

Donut plots a version of pie chart, are a popular data visualization technique that give visuals for proportions in your data. Let’s learn to plot donut plots for 10 random variables.

Plotting Donut Plots

Area graphs are useful tools to display how quantitative values develop over a given Interval. In this video, we will use the GRAL Java library to plot area graphs.

Plotting Area Graphs
About the Instructor
Packt Publishing
3.9 Average rating
8,274 Reviews
59,285 Students
688 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.