Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Text Mining & NLP

Name: Text Mining & NLP
Rating: 4.7 (39 reviews)

Text Mining

Created byAISPRY TUTOR

Last updated 4/2024

English

What you'll learn

Named Entity Recognition(NER)will enable you to automatically identify entities like names, dates, and locations within text,enriching data for further analysis
A Text Mining course equips you with a wide range of techniques and tools to unlock insights from unstructured text data.
Text classification and clustering techniques will empower you to build models that group and categorize text data automatically.
Understanding the mechanics of Bag of Words, TF-IDF, and word embeddings, and how each approach captures different aspects of the underlying meaning in text.

Course content

23 sections • 83 lectures • 16h 48m total length

Introduction about Tutor3:15
Meet Bharani Kumar de Puru, a 16-year industry veteran and 4.0 implementer who has worked with HSBC, ITC Infotech, Infosys, and Deloitte. He leads Innodata TICs and 360 digit EdTech.

Agenda and stages of Analytics1:02
Explore the agenda and stages of analytics within the data science training program. Understand how the project management methodology frames real-world analytics work.
What is Diagnoistic Analytics?1:21
Discover diagnostic analytics by asking why events happen, such as spikes and drops in covid-19 cases, and tagging reasons like lockdowns and vaccination.
What is Prescriptive Analytics?11:41
Prescriptive analytics uses predictions and what-if analysis to determine actions, from vaccine strategies to automated shutdowns, across industries, highlighting the four analytics stages: descriptive, diagnostic, predictive, and prescriptive.
What is Predictive Analytics ?1:57
Explore predictive analytics by using current data to forecast future outcomes, such as disease cases or vaccination rates, while assessing validity across various time horizons and changing conditions.
What is CRISP-ML(Q)?3:08
The lecture introduces crisp-ml(q), a six-phase cross-industry standard process for machine learning quality assurance, covering business and data understanding, data preparation, model building and tuning, evaluation, deployment, monitoring, and maintenance.

Business Understanding - Define Scope Of Application18:44
Define the scope of application in the business understanding phase, set objectives and constraints, and use historical data to predict loan default and optimize profits.
Business Understanding - Define Success Criteria8:13
Define the business success criteria by tying goals to KPIs, such as reducing loan defaults under 5%, and balance this with machine learning accuracy, performance, and ROI expectations.
Business Understanding - Use Cases9:59
Explore business understanding and use cases, balancing fraud reduction with customer convenience in credit card transactions, and illustrate objective setting under cost constraints in agriculture and drone-driven precision farming.

Agenda Data Understanding0:49
Explore data types and scales of measurement in agenda data understanding, clarify key terms, and compare primary and secondary data collection techniques for text mining and NLP.
Introduction to Data Understanding ?6:18
understand data as measured information for analysis and modeling to enable predictions and optimization, using sales scenarios and what-if analysis for management decisions.
Data Types - Continuous Vs Discrete11:18
Explore the difference between continuous and discrete data, including numeric and categorical types, with real-world examples like time, money, height, and weight.
Categorical Data Vs Count Data6:45
Contrast categorical data and count data within discrete data, covering binary and multi categorical data, with churn, default, and claim examples; classify data as nominal, ordinal, interval, and ratio.
Pratical Data Understanding using Realtime Examples11:15
Explore practical data understanding by distinguishing nominal, ordinal, and interval data through real-world travel examples, illustrating their properties, limitations, and implications for interpretation.
Scale of Measurement3:34
Explore scale of measurement from nominal, ordinal, and interval to ratio data; learn how counts, frequencies, and ranking differ, and why ratio data enables the most comprehensive statistical analysis.
Quantitave Vs Qualitative5:04
Explore the difference between quantitative and qualitative data, including structured, unstructured, and categorical data, and see how numeric temperature and qualitative observations inform decisions.
Structure Vs Unstructured Data13:04
Identify the differences between structured data in tabular form and unstructured data such as videos, images, audio, and text, and note semi-structured types and conversion approaches.
Cross Sectional Vs Time Series Vs Panel Longitudinal Data7:01
Compare cross-sectional, time series, and panel longitudinal data, noting how date and time importance and data structure, with multiple columns versus a single date column, guide analytic techniques.

What is Data Collection?4:12
Discusses data collection and contrasts primary and secondary data sources, with examples. Explains common data terms for outputs and inputs, and the role of rows, columns, and structured datasets.
Understanding Primary Data Sources22:15
Understand primary data sources and how external data from social media and IoT sensors can improve loan default models, while examining data privacy, quality, and regulatory risks.
Understanding Secondary Data Sources13:31
Analyze secondary and primary data sources to improve telecom planning by integrating internal data sources with Google Maps and drone analytics.
Understanding Data Collection Using Survey6:46
Translate business reality into data collection using survey design by performing root cause analysis, setting research objectives, and crafting constructs and dimensional survey questions on customer preferences and price sensitivity.
Understanding Data Collection Using DoE7:15
Learn how to design experiments for data collection and optimize mobile promotions by testing discount levels, expiry dates, and customer proximity to reveal redemption patterns.
Understanding possible errors in Data Collection Stage16:21
identify and mitigate common data collection errors, from faulty measurement devices to environmental factors, using gauge R&R, attribute agreement analysis, and standard operating procedures.
Understanding Bias and Fairness5:17
Understand bias and fairness in data science by ensuring diverse data and avoiding sensitive attributes in models. Spend most effort on business understanding and data preparation before algorithm selection.

Introduction to CRISP-ML(Q) Data preparation & Agenda2:08
Summarize the crisp data preparation framework, its six phases, and the emphasis on business and data understanding, objectives, constraints, project charter, and secondary and primary data sources.
What is Probability?5:33
Learn the probability formula: probability equals the number of interested events divided by the total events, and apply it to rolling a die to identify outcomes above three.
What is Random Variable?12:00
Explore random variables by separating random from variable, using coin and die outcomes to illustrate probabilities and the uppercase random variable versus lowercase values.
Understanding Probability and its Application,Probabiity Discussion13:17
Explore how probability shapes real world decisions, from distributions of daily sales to discrete vs continuous data, random variables, and how AI and courts use probabilistic reasoning.

Understanding Normal Distribution15:42
Explore normal distribution as a continuous probability distribution for a variable like height or profit, with area under the curve equal to one and values from minus to plus infinity.
What is Inferential Statistics?10:41
Explore inferential statistics, drawing inferences about a population from a sample using a sampling frame, simple random sampling, and bias, plus hypothesis testing, parametric and nonparametric methods.
Understanding Standard Normal Distribution & what is Z Scores?28:16
Explore the standard normal distribution and z-scores, explaining symmetry, mean, and sigma levels from one to six, plus how to standardize data using z = (x−μ)/σ.
Understanding Measures of central tendency ( First moment business decision)10:54
Explore measures of dispersion as variation in profits across locations, the second moment business decision. See control charts and outliers with examples from chocolate and orange juice.
Understanding Measures of Dispersion ( Second moment business decision)10:54
Explore measures of dispersion and the second moment in business decision making, using HSBC profit data across Malaysia and Singapore to compare volatility, forecasting, and control charts.
Understanding Box Plot(Diff B-w Percentile and Quantile and Quartile)6:17
Explore the differences among percentiles, quartiles, and quantiles within a box plot, including how Q1, Q2 (median), Q3 map to 25th, 50th, and 75th percentiles, from min to max.
Understanding Graphical Techniques-Q-Q-Plot8:41
Explore graphical techniques for assessing normality with a normal qq plot. Compare histograms and box plots, and understand sample versus theoretical quantiles and standardized values.
Understanding about Bivariate Scatter Plot35:36
Explore the bivariate scatter plot to examine the direction and strength of correlation between two numerical variables. Identify linear and nonlinear patterns, including quadratic and exponential relationships, and spot outliers.

Python Installation6:07
Install python from python.org, use the latest version 3.10.7, and explore python as an operating system agnostic, open-source tool that is free for individuals and organizations.
Anakonda Installation7:00
Learn to install the Anaconda distribution across Windows, macOS, and Linux. Its OS-agnostic, pre-installed libraries save time by avoiding version conflicts; Anaconda is free for individuals (paid for commercial use).
Understand about Anakonda Navigator, Spyder & Python Libraries24:30
Explore Anaconda Navigator, Spyder, and Python libraries; compare IDEs like Jupiter and Colab, and learn data loading, manipulation with pandas, and visualization with matplotlib and seaborn.
Understanding about Jupyter and Google Colab8:41
Learn to set up and use Jupyter and Google Colab for Python data analysis, run code, import pandas, and read CSV datasets with practical, hands-on examples in a notebook environment.

Understanding Data Cleansing Typecasting10:32
Explore data cleansing and typecasting as core data pre-processing steps, turning unstructured log data into structured, correctly typed inputs for Python while addressing duplicates, missing values, and outliers.
Understanding Data Cleansing Typecasting Using Python15:42
Learn data cleansing and typecasting in Python using pandas, exploring how to read CSV data, inspect dtypes, and convert columns with astype to proper object, integer, or float types.

Understanding Handling Duplicates10:48
Identify and consolidate duplicate records across accounts using master data management and data quality concepts, then remove duplicate rows and columns, reducing redundancy even when high correlation exists.
Understanding Handling Duplicates using Python25:26
Identify and handle duplicate records in the mtcars dataset with pandas in Python. Explore keep options (first, last, false) and drop_duplicates to retain or remove duplicates.

Requirements

Basic Programming Skills: A fundamental understanding of programming concepts is important. Most text mining involves working with programming languages like Python or R to process and analyze text data.
Basic Statistics and Data Analysis: Familiarity with basic statistical concepts and data analysis techniques will be helpful, as these are often used in text mining to analyze and interpret the results.
Knowledge of Data Structures: Understanding data structures like arrays, lists, dictionaries, and data frames is essential for handling and manipulating text data effectively.

Description

Unlock the power of textual data with our comprehensive Text Mining course. In today's data-driven world, extracting valuable insights from text has become crucial for businesses and organizations. This course equips you with the skills and techniques needed to effectively analyze, process, and derive meaningful information from textual data sources.

In today's digital age, where data is generated in staggering amounts, the potential insights hidden within textual data have become increasingly significant. Text mining, a discipline that combines data mining and natural language processing (NLP), has become a potent technique for extracting valuable information from unstructured written resources. This comprehensive course delves into the intricacies of text mining, equipping you with a deep understanding of its fundamentals, techniques, and applications.

Text mining, also known as text analytics or text data mining, involves the process of transforming unstructured textual data into structured and actionable insights. As text data proliferates across various domains such as social media, customer reviews, news articles, and research papers, the ability to process and analyze this data has become a critical skill for professionals in fields ranging from business and marketing to healthcare and academia.

One of the first steps in text mining is text preprocessing. Raw text data often contains noise, irrelevant information, and inconsistencies. In this course, you'll learn how to clean and preprocess text using techniques like tokenization, which involves breaking down text into individual words or phrases, and stemming, which reduces words to their base or root form. Additionally, you'll explore methods to remove common stopwords—words that add little semantic value—while considering the nuances of different languages and domains.

A key challenge in text mining lies in representing text in a format that machine learning algorithms can comprehend. This course delves into various text representation methods, including the bag-of-words model and Term Frequency-Inverse Document Frequency (TF-IDF) weighting. These techniques quantify the presence and importance of words within a document or corpus. Moreover, you'll delve into more advanced methods like word embeddings, which capture semantic relationships between words and enable machines to understand context.

Natural Language Processing (NLP) forms the backbone of text mining, and this course introduces you to its essentials. You'll learn about parts-of-speech tagging, which involves identifying the grammatical components of a sentence, and named entity recognition, a process of identifying and classifying entities such as names, dates, and locations within text. Understanding syntactic analysis further enhances your ability to extract grammatical structures and relationships from sentences.

Sentiment analysis, a pivotal application of text mining, enables you to determine the emotional tone or sentiment expressed in text. Businesses can leverage sentiment analysis to gauge customer opinions and make informed decisions, while social media platforms can monitor public sentiments about specific topics or brands. You'll learn how to categorise text as good, negative, or neutral through practical exercises and projects, enabling you to glean priceless information from client testimonials, social media postings, and more.

In the realm of information retrieval, text mining shines as a mechanism to efficiently navigate and extract relevant information from large corpora of text. Techniques like Boolean retrieval, which involves using logical operators to search for specific terms, and TF-IDF ranking, which ranks documents based on term importance, are covered extensively. Moreover, you'll delve into the architecture of search engines, gaining insights into how modern search platforms like Google operate behind the scenes.

The course doesn't stop at theory—it empowers you with hands-on experience using popular text mining tools and libraries. You'll work with NLTK (Natural Language Toolkit), spaCy, scikit-learn, and gensim, among others, gaining proficiency in applying these tools to real-world text mining scenarios. These practical sessions enhance your confidence in implementing the concepts you've learned, ensuring you're well-prepared for actual text mining projects.

This course's main focus is on real-world projects that let you use your newly acquired abilities to solve actual issues. From analyzing customer feedback sentiment for a product to categorizing research articles into relevant topics, you'll work with diverse datasets to solve challenges faced across industries. These projects not only bolster your portfolio but also prepare you to tackle real-world text mining scenarios, enhancing your employability and value as a professional.

It's imperative to consider ethical considerations in text mining. As you extract insights from textual data, you'll encounter privacy concerns, potential biases, and the responsibility to ensure your analysis is fair and unbiased. This course addresses these ethical challenges, emphasizing the importance of maintaining data privacy and being transparent about the methods used in text mining.

Text mining is an evolving field, and staying abreast of its future trends is crucial. The course introduces you to the cutting-edge advancements in the field, including the integration of deep learning techniques for text analysis and the fusion of text data with other data types like images and structured data. By keeping up with these trends, you'll position yourself as a forward-thinking data professional capable of harnessing the latest tools and methodologies.

In conclusion, the Text Mining Fundamentals and Applications course equips you with the skills and knowledge to navigate the world of unstructured text data. From preprocessing and representation to sentiment analysis, information retrieval, and ethical considerations, you'll gain a comprehensive understanding of text mining's intricacies. Real-world projects and hands-on exercises solidify your expertise, making you well-prepared to tackle text mining challenges across industries. Embark on this journey to unlock the wealth of insights hidden within textual data and propel your career forward in the age of data-driven decision-making.

Who this course is for:

Data Scientists and Analysts: Data professionals looking to expand their skillset to include text analysis techniques in order to gain deeper insights from unstructured text data.
Business Analysts: Professionals seeking to enhance their understanding of customer sentiments, market trends, and competitive intelligence by analyzing text data from sources like social media, reviews, and customer feedback.
Researchers and Academics: Scholars and researchers aiming to extract information from large text corpora for academic purposes, such as sentiment analysis, topic modeling, and content
Natural Language Processing (NLP) Enthusiasts: Individuals interested in diving into the field of NLP and exploring techniques to process and analyze human language through machine learning and computational linguistics.

Text Mining & NLP

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 3min

Introduction about CRISP-ML(Q)5 lectures • 19min

Business Understanding Phase3 lectures • 37min

Data Understanding Phase - Data Types9 lectures • 1hr 5min

Data Understanding Phase - Data Collection7 lectures • 1hr 16min

Understanding Basic Statistics4 lectures • 33min

Data Preparation Phase - Exploratory Data Analysis (EDA)8 lectures • 2hr 7min

Python Installation and Setup4 lectures • 46min

Data Preparation Phase | Data Cleansing- Type Casting2 lectures • 26min

Data Preparation Phase | Data Cleansing- Handling Duplicates2 lectures • 36min

Requirements

Description

Who this course is for: