Hadoop & Data Science NLP (All in One Course).

Name: Hadoop & Data Science NLP (All in One Course).
Rating: 3.3 (14 reviews)

Learn to develop real world applications using Hadoop (NIFI, Solr, Banana Dashboard, Hive, Zappelin) & Data Science NLP.

Created byNitin Kaushik

Last updated 4/2018

English

What you'll learn

You will be able to develop a real world an end to end application which will encompass both Hadoop as well as Natural Language Processing (Data Science).
Setup a Hadoop Cluster on your laptop free of cost and then connect to different hadoop services.
Develop distributed applications based on Hadoop Framework, Different Hadoop pillars, HDFS Architecture, MapReduce and different types of Data in Hadoop.
Visualize Hadoop ecosystem services as well as components like Memory usage, Cluster Load etc. in the form of dashboard on a Web Interface called Ambari.
Design and Develop scalable, fault tolerant and flexible applications which can store and distribute large data sets across inexpensive servers.
Develop scripts based on several commands in Hadoop to manage files and datasets.
Understand the different building blocks of Apache NIFI helping in data movement, transformation etc. Also learn about NIFI Architecture and its various applications.
Steps to Install Apache NIFI and making changes in configuration files to run it seamlessly.
Develop a complete workflow application in NIFI which can take data from the streaming source, perform transformations on this data and then store it in Hadoop.
Spin up Apache Solr as one of the service, configure it to receive streaming data from NIFI processor to perform real time analytics on this data.
Understand the architecture and concepts related to Apache Solr as well as several of its features.
Create a Banana Dashboard to visualize the real time analytics happening on live streaming data after getting an understanding of components and structure of Banana Dashboard.
Visualize where does Hive fit in Hadoop Ecosystem, its Architecture as well as how exactly it works.
Develop an understanding of how data can be stored in structured form in Apache Hive. In depth knowledge of several of its components.
Develop and Visualize the data in the form of Graphs, Histograms, Pie Charts etc. using another Hadoop Ecosystem tool (notebook) called Apache Zappelin.
Develop the concepts of Natural Language Processing and integrate them all to develop a working NLP application.
Develop basic building blocks of Natural Language Processing and write associated python scripts.
Build a machine learning model using Python for the application going to be built.

Course content

11 sections • 51 lectures • 11h 30m total length

Course Introduction0:07
General Overview of Hadoop6:37
Explore big data concepts and Hadoop's role in storing and processing vast data across clusters, with structured, semi-structured, and unstructured data types and actionable insights.
A quick look at Hadoop History4:58
Hadoop Framework and Ecosystem13:44
Let's learn about HDFS and Mapreduce13:44
Peak into Hadoop YARN4:28
Hadoop Quiz

Download Hadoop and other supporting tools on your Desktop/Laptop8:27
Install Hadoop and make Configuration changes.11:43
Access Hadoop Sandbox and Welcome Page.12:40
Let's do some hands-on with Hadoop Operations22:47
Explore basic Hadoop operations and practice commands for listing, copying, moving, creating directories, navigating, and configuring file permissions across local and distributed environments.

NIFI Concepts7:20
Explore Apache NiFi concepts, including flow files, flow file processors, ports and connections, and flow control, and learn how processors route, transform, and orchestrate data between systems.
Acquire knowledge on Apache NIFI's UI Canvas Components5:23
Explore Apache NIFI's UI canvas components, including the top toolbar, status bar, operator palette, search, and global menu, plus bird's eye view, breadcrumbs, and templates for data flow navigation.
Apache NIFI Architecture3:22
Apache NIFI Quiz

An introduction of Apache Solr and some of its features4:36
Learn Basics and Components of Search Engine2:07
How Search Engine works ?3:24
Peak into the Architecture of Apache Solr4:10
Apache Solr - Basic Concepts5:27
Explore Apache Solr basic concepts and configuration files, including core, schema, and memory management, and understand distributed Solr cloud terms like node, collection, shard, replica, and leader using ZooKeeper.
Apache Solr Quiz

An Introduction to Apache Hive6:21
Apache Hive Architecture5:17
How does Apache Hive works ?4:11
Learn how Hive processes a query end-to-end—from the command line driver and decompiler to the compiler, plan, and execution on the execution engine for map reduce job, then fetch results.
Apache Hive Data Types3:06
Apache Hive - Create Database and Table31:52
Apache Hive - Table Partitioning22:53
Apache Hive - Operators and Functions20:06
Apache Hive - Views and Indexes14:06
Setup Hive Tables to receive JSON Format Data14:08
Set up hive tables to receive JSON format data by following step-by-step installation, patch extraction, directory setup, and service restart processes described in the lecture.
Create Hive Tables and Views for storing JSON Format Data1:16:49
Visualize Data using Apache Zappelin19:21
Apache Hive Quiz
Hadoop and Ecosystem Assignment

NLP - Tokenizing Words and Sentences17:09
NLP - Word Stemming15:32
Explore word stemming as a preprocessing step that reduces variants by trimming suffixes (e.g., shining to shine) using the Porter stemmer and tokenization before NLP analysis.
NLP - Get an understanding of Stopwords11:24
NLP - Dive into Part of Speech Tagging13:21
NLP - Locate and Classify entities using Named Entity Recognition5:47
NLP - Understand the concept of Lemmatization6:44
NLP - Build an Algorithmic classifier to classify the Text19:51
NLP - Importance of Words as Features11:07
NLP - Train a Machine Learning model using Naive Bayes Algorithm12:13
NLP - Get the Machine Learning model loaded faster using Pickling7:49
Learn how to save and reuse a trained machine learning model with Python pickle, enabling fast loading and prediction of sentiment without retraining.
NLP - Putting everything together for Sentiment Analysis35:38
NLP - Real Time Live Twitter Sentiment Analysis27:09
Learn to build a real-time Twitter sentiment analysis pipeline that streams live tweets, classifies sentiment as positive or negative, and uses pickling to speed up repeated predictions.
NLP - Plotting Live Twitter Sentiments12:02
Natural Language Processing Quiz

Requirements

Basic Python Programming
A computer with atleast 8 GB of RAM

Description

The demand for Big Data Hadoop Developers, Architects, Data Scientists, Machine Learning Engineers is increasing day by day and one of the main reason is that companies are more keen these days to get more accurate predictions & forecasting result using data. They want to make sense of data and wants to provide 360 view of customers thereby providing better customer experience.

This course is designed in such a way that you will get an understanding of best of both worlds i.e. both Hadoop as well as Data Science. You will not only be able to perform Hadoop related operations to gather data from the source directly but also they can perform Data Science specific tasks and build model on the data collected. Also, you will be able to do transformations using Hadoop Ecosystem tools. So in a nutshell, this course will help the students to learn both Hadoop and Data Science Natural Language Processing in one course.

Companies like Google, Amazon, Facebook, Ebay, LinkedIn, Twitter, and Yahoo! are using Hadoop on a larger scale these days and more and more companies have already started adopting these digital technologies. If we talk about Text Analytics, there are several applications of Text Analytics (given below) and hence companies prefer to have both of these skillset in the professionals.

One of the application of text classification is a faster emergency response system can be developed by classifying panic conversation on social media.
Another application is automating the classification of users into cohorts so that marketers can monitor and classify users based on how they are talking about products, services or brands online.
Content or product tagging using categories as a way to improve browsing experience or to identify related content on the website. Platforms such as news agencies, directories, E-commerce, blogs, content curators, and likes can use automated technologies to classify and tag content and products.

Companies these days are leaning towards candidates who are equipped with best of both worlds and this course will proved to be a very good starting point. This course covers complete pipeline of modern day ELT (Extract, Load and Transform) and Analytics as shown below:

Get data from Source --> Load data into Structured/Semi Structured/Unstructured form --> Perform Transformations --> Pre-process the Data further --> Build the Data Science Model --> Visualize the Results

Learn and get started with the popular Hadoop Ecosystem technologies as well one the most of the most hot topics in Data Science called Natural Language Processing. In this course you will :

Do Hadoop Installation using Hortonworks Sandbox. You will also get an opportunity to do some hands-on with Hadoop operations as well as Hadoop Management Service called Amabri on your computer.
Perform HDFS operations to work with continuous stream of data.
Install SSH and File Transfer related tools which helps in operational activities of Hadoop.
Perform NIFI installation and develop complete workflow on Web UI to move the data from source to destination. Also, perform transformations on this data using NIFI processors.
Spin up Apache Solr which allows full text search and also to receive text for performing Real Time Text Analysis.
Engage Banana Dashboard to visualize Real Time Analytics on streaming data.
Store the Real Time streaming JSON data in structured form using Hive Tables as well as in flat file format in HDFS.
Visualize the data in the form of Charts, Histograms using Apache Zappelin.
Learn the Building blocks of Natural Language Processing to develop Text Analytics Skills.
Unleash the Machine Learning capabilities using Data Science Natural Language Processing and build a Machine Learning Model to classify Text Data.

Who this course is for:

Anyone who wants to learn both Hadoop and Data Science from scratch.
Developers, Programmers or Database Administrators who want to transition to Hadoop and Hadoop Ecosystem tools like HDFS, Hive, Solr, NIFI, Banana and also wants to explore Data Science.
Aspiring Data Scientists, Data Analysts, Business Analysts who want to learn Natural Language Processing as an added arsenal as well as wants to learn Hadoop as well.
Product , Program or Project Managers who wants to understand the complete architecture as well as understand how Hadoop and Data Science can be integrated together.
Enterprise Architects, Solution Architects who wants to learn about Hadoop Ecosystem and related technologies to design Big Data related solutions.

Hadoop & Data Science NLP (All in One Course).

What you'll learn

Explore related topics

Course content

Introduction to Hadoop6 lectures • 44min

Let's tame the Elephant - Install Hadoop Sandbox and Run few Hadoop commands4 lectures • 56min

The Niagara Files - Introduction to Apache NIFI3 lectures • 16min

Install and Configure NIFI2 lectures • 31min

Full Text Search with Apache Solr - An Introduction5 lectures • 20min

Install and Configure Apache Solr1 lecture • 28min

Twitter App Setup for bringing data into Hadoop1 lecture • 11min

Banana Dashboard for visualizing real time streaming data2 lectures • 1hr 12min

Apache Hive11 lectures • 3hr 38min

Data Science - Natural Language Processing13 lectures • 3hr 16min

Requirements

Description

Who this course is for: