Spark NLP for Data Scientists

Name: Spark NLP for Data Scientists
Rating: 4.7 (66 reviews)

Unlock your NLP power with Spark NLP, the most popular NLP library in enterprises

Highest Rated

Created byAce Vo, David Talby, Jiri Dobes, Veysel Kocaman

Last updated 6/2023

English

What you'll learn

Utilize 20,000+ State-of-the-Art NLP models in 200+ languages
Train & tune your own NLP models by leveraging the Spark NLP's pre-defined classifier architecture on your own datasets
Perform popular NLU tasks in one line of code - like generate texts, summarize texts, answer questions
Deploy models as API's with NLP Server, a Docker container that contains all Spark NLPs capabilities

Course content

35 sections • 78 lectures • 12h 34m total length

Spark NLP for Data Scientists overview3:53
Welcome to Spark NLP to Data Scientists. We are excited to bring the technology to you. In this video we provide a quick overview about our technology.
Spark NLP Course Structure4:00
Here we introduce our course structure so you know what to expect.

Context Spell Checker part 17:28
?Spell Checking is a very important task in any NLP pipeline that needs to deal with noisy and incorrect data. In addition to text generated by Optical Character Recognition (OCR), social media interactions like tweets, instant messaging, blog posts, or any other user generated text, content will cause problems. Being able to rely on correct data, without spelling problems reduces vocabulary sizes at different stages in the pipeline, and improves the performance of all the models in the pipeline.
Spell Checkers can recommend corrections on three levels: subword level, word level and sentence level. Spark-NLP’s ContextSpellChecker annotator, uses contextual information to both detect errors and produce the best corrections.
Context Spell Checker part 26:39
?Spell Checking is a very important task in any NLP pipeline that needs to deal with noisy and incorrect data. In addition to text generated by Optical Character Recognition (OCR), social media interactions like tweets, instant messaging, blog posts, or any other user generated text, content will cause problems. Being able to rely on correct data, without spelling problems reduces vocabulary sizes at different stages in the pipeline, and improves the performance of all the models in the pipeline.
Spell Checkers can recommend corrections on three levels: subword level, word level and sentence level. Spark-NLP’s ContextSpellChecker annotator, uses contextual information to both detect errors and produce the best corrections.
Context Spell Checker part 37:08
?Spell Checking is a very important task in any NLP pipeline that needs to deal with noisy and incorrect data. In addition to text generated by Optical Character Recognition (OCR), social media interactions like tweets, instant messaging, blog posts, or any other user generated text, content will cause problems. Being able to rely on correct data, without spelling problems reduces vocabulary sizes at different stages in the pipeline, and improves the performance of all the models in the pipeline.
Spell Checkers can recommend corrections on three levels: subword level, word level and sentence level. Spark-NLP’s ContextSpellChecker annotator, uses contextual information to both detect errors and produce the best corrections.
Context Spell Checker part 48:11
?Spell Checking is a very important task in any NLP pipeline that needs to deal with noisy and incorrect data. In addition to text generated by Optical Character Recognition (OCR), social media interactions like tweets, instant messaging, blog posts, or any other user generated text, content will cause problems. Being able to rely on correct data, without spelling problems reduces vocabulary sizes at different stages in the pipeline, and improves the performance of all the models in the pipeline.
Spell Checkers can recommend corrections on three levels: subword level, word level and sentence level. Spark-NLP’s ContextSpellChecker annotator, uses contextual information to both detect errors and produce the best corrections.
NorvigSweeting Spellchecker5:34
?Learning Objectives:
Understand how to check spelling using NorvigSweeting annotators.
Understand the difference between NorvigSweetingApproach and NorvigSweetingModel.
Customize the use of these annotators by setting their parameters.
SymmetricDelete Spellchecker5:31
?Learning Objectives:
Understand how to check spelling using SymmetricDelete annotators.
Understand the difference between SymmetricDeleteApproach and SymmetricDeleteModel.
Customize the use of these annotators by setting their parameters.

Lemmatizer7:50
? Learning Objectives:
Understand the process of reducing inflected words to their base forms to obtain the lemmas.
Be able to train custom LemmatizerModel annotators.
Become confortable with creating pipelines to preprocess texts with Lemmatizer and LemmatizerModel.
Stemmer5:13
? Learning Objectives:
Understand how extract the base form of the words by removing affixes from them.
Become comfortable using the different parameters of the annotator.

SentenceDetectorDL9:38
? Learning Objectives:
Understand how SentenceDetectorDL algorithm works.
Understand how SentenceDetectorDL follows an unsupervised approach which builds upon features extracted from the text.
Become comfortable using the different parameters of the annotator.
Normalizer7:45
?Learning Objectives:
Understand how to clean tokens by making use of this annotator.
Become comfortable using the different parameters of the annotator.
StopWordsCleaner8:44
?Learning Objectives:
Understand how to drop stop words from the input sequences.
Become comfortable using the different parameters of the annotator.
How to use pretrained StopWordsCleaner models.

Tokenizer13:47
? Learning Objectives:
Understand how to use Tokenizer.
Become comfortable using the different parameters of the Tokenizer.
RegexTokenizer11:21
?Learning Objectives:
Understand how different regex patterns split sequences of words in different ways.
Understand the difference between the regex tokenizer and regular tokenizer.
Become comfortable using the different parameters of the annotator.
ChunkTokenizer12:40
?Learning Objectives:
Understand how to split chunks into tokens in different ways.
Become comfortable using the different parameters of the annotator.
TokenAssembler9:36
?Learning Objectives:
Understand how it reconstructs a DOCUMENT type annotation from tokens.
Become comfortable using the different parameters of the annotator.

YAKE keyword extractor11:46
? Learning Objectives:
Understand the meaning of Keyword Extraction, namely being the process of automatically extracting the most important keywords from a text document.
Understand how YakeKeywordExtraction follows an unsupervised approach which builds upon features extracted from the text.
Become comfortable using the different parameters of the annotator - most parameters will help define:
total number of keywords to be selected,
minimum or maximum words in a keyword,
list of stopwords.

Requirements

Hands-on understanding of Python is needed
Recommended: basic understanding of machine learning and natural language processing
Nice to have: basic understanding of Apache Spark

Description

Welcome to the Spark NLP for Data Scientist course!

This course will walk you through building state-of-the-art natural language processing (NLP) solutions using John Snow Labs’ open-source Spark NLP library. Our library consists of more than 20,000 pretrained models with 250 plus languages. This is a course for data scientists that will enable you to write and run live Python notebooks that cover the majority of the open-source library’s functionality. This includes reusing, training, and combining models for NLP tasks like named entity recognition, text classification, spelling & grammar correction, question answering, knowledge extraction, sentiment analysis and more.

The course is divided into 11 sections: Text Processing, Information Extraction, Dependency Parsing, Text Representation with Embeddings, Sentiment Analysis, Text Classification, Named Entity Recognition, Question Answering, Multilingual NLP, Advanced Topics such as Speech to text recognition, and Utility Tools &Annotators. In addition to video recordings with real code walkthroughs, we also provide sample notebooks to view and experiment. At the end of the cost, you will have an opportunity to take a certification, at no cost to you.

The course is also updated periodically to reflect the changes in our models.

Looking forward to seeing you in the class, from all of us in John Snow Labs.

Who this course is for:

Data scientists who are looking to use Natural Language Processing at scale
Data scientists looking to build custom natural language understanding applications
Data Analysts who want to apply Natural Language Processing

Spark NLP for Data Scientists

What you'll learn

Explore related topics

Course content

Spark NLP Overview2 lectures • 8min

Text Preprocessing - Text Normalization with Spell Checker6 lectures • 41min

Text Processing - Extracting and Normalizing the Dates1 lecture • 16min

Text Preprocessing - NGram Generation1 lecture • 5min

Text Preprocessing - Stemming and Lemmatizing2 lectures • 13min

Text Preprocessing Models3 lectures • 26min

Text cleaning with DocumentNormalizer1 lecture • 10min

Text Preprocessing - Text tokenization with Tokenizer4 lectures • 47min

Information Extraction - Keyword Extraction1 lecture • 12min

Information Extraction with Regular Expression1 lecture • 13min

Requirements

Description

Who this course is for: