
Meet Bharani Kumar de Puru, a 16-year industry veteran and 4.0 implementer who has worked with HSBC, ITC Infotech, Infosys, and Deloitte. He leads Innodata TICs and 360 digit EdTech.
Explore the agenda and stages of analytics within the data science training program. Understand how the project management methodology frames real-world analytics work.
Discover diagnostic analytics by asking why events happen, such as spikes and drops in covid-19 cases, and tagging reasons like lockdowns and vaccination.
Prescriptive analytics uses predictions and what-if analysis to determine actions, from vaccine strategies to automated shutdowns, across industries, highlighting the four analytics stages: descriptive, diagnostic, predictive, and prescriptive.
Explore predictive analytics by using current data to forecast future outcomes, such as disease cases or vaccination rates, while assessing validity across various time horizons and changing conditions.
The lecture introduces crisp-ml(q), a six-phase cross-industry standard process for machine learning quality assurance, covering business and data understanding, data preparation, model building and tuning, evaluation, deployment, monitoring, and maintenance.
Define the scope of application in the business understanding phase, set objectives and constraints, and use historical data to predict loan default and optimize profits.
Define the business success criteria by tying goals to KPIs, such as reducing loan defaults under 5%, and balance this with machine learning accuracy, performance, and ROI expectations.
Explore business understanding and use cases, balancing fraud reduction with customer convenience in credit card transactions, and illustrate objective setting under cost constraints in agriculture and drone-driven precision farming.
Explore data types and scales of measurement in agenda data understanding, clarify key terms, and compare primary and secondary data collection techniques for text mining and NLP.
understand data as measured information for analysis and modeling to enable predictions and optimization, using sales scenarios and what-if analysis for management decisions.
Explore the difference between continuous and discrete data, including numeric and categorical types, with real-world examples like time, money, height, and weight.
Contrast categorical data and count data within discrete data, covering binary and multi categorical data, with churn, default, and claim examples; classify data as nominal, ordinal, interval, and ratio.
Explore practical data understanding by distinguishing nominal, ordinal, and interval data through real-world travel examples, illustrating their properties, limitations, and implications for interpretation.
Explore scale of measurement from nominal, ordinal, and interval to ratio data; learn how counts, frequencies, and ranking differ, and why ratio data enables the most comprehensive statistical analysis.
Explore the difference between quantitative and qualitative data, including structured, unstructured, and categorical data, and see how numeric temperature and qualitative observations inform decisions.
Identify the differences between structured data in tabular form and unstructured data such as videos, images, audio, and text, and note semi-structured types and conversion approaches.
Compare cross-sectional, time series, and panel longitudinal data, noting how date and time importance and data structure, with multiple columns versus a single date column, guide analytic techniques.
Discusses data collection and contrasts primary and secondary data sources, with examples. Explains common data terms for outputs and inputs, and the role of rows, columns, and structured datasets.
Understand primary data sources and how external data from social media and IoT sensors can improve loan default models, while examining data privacy, quality, and regulatory risks.
Analyze secondary and primary data sources to improve telecom planning by integrating internal data sources with Google Maps and drone analytics.
Translate business reality into data collection using survey design by performing root cause analysis, setting research objectives, and crafting constructs and dimensional survey questions on customer preferences and price sensitivity.
Learn how to design experiments for data collection and optimize mobile promotions by testing discount levels, expiry dates, and customer proximity to reveal redemption patterns.
identify and mitigate common data collection errors, from faulty measurement devices to environmental factors, using gauge R&R, attribute agreement analysis, and standard operating procedures.
Understand bias and fairness in data science by ensuring diverse data and avoiding sensitive attributes in models. Spend most effort on business understanding and data preparation before algorithm selection.
Summarize the crisp data preparation framework, its six phases, and the emphasis on business and data understanding, objectives, constraints, project charter, and secondary and primary data sources.
Learn the probability formula: probability equals the number of interested events divided by the total events, and apply it to rolling a die to identify outcomes above three.
Explore random variables by separating random from variable, using coin and die outcomes to illustrate probabilities and the uppercase random variable versus lowercase values.
Explore how probability shapes real world decisions, from distributions of daily sales to discrete vs continuous data, random variables, and how AI and courts use probabilistic reasoning.
Explore normal distribution as a continuous probability distribution for a variable like height or profit, with area under the curve equal to one and values from minus to plus infinity.
Explore inferential statistics, drawing inferences about a population from a sample using a sampling frame, simple random sampling, and bias, plus hypothesis testing, parametric and nonparametric methods.
Explore the standard normal distribution and z-scores, explaining symmetry, mean, and sigma levels from one to six, plus how to standardize data using z = (x−μ)/σ.
Explore measures of dispersion as variation in profits across locations, the second moment business decision. See control charts and outliers with examples from chocolate and orange juice.
Explore measures of dispersion and the second moment in business decision making, using HSBC profit data across Malaysia and Singapore to compare volatility, forecasting, and control charts.
Explore the differences among percentiles, quartiles, and quantiles within a box plot, including how Q1, Q2 (median), Q3 map to 25th, 50th, and 75th percentiles, from min to max.
Explore graphical techniques for assessing normality with a normal qq plot. Compare histograms and box plots, and understand sample versus theoretical quantiles and standardized values.
Explore the bivariate scatter plot to examine the direction and strength of correlation between two numerical variables. Identify linear and nonlinear patterns, including quadratic and exponential relationships, and spot outliers.
Install python from python.org, use the latest version 3.10.7, and explore python as an operating system agnostic, open-source tool that is free for individuals and organizations.
Learn to install the Anaconda distribution across Windows, macOS, and Linux. Its OS-agnostic, pre-installed libraries save time by avoiding version conflicts; Anaconda is free for individuals (paid for commercial use).
Explore Anaconda Navigator, Spyder, and Python libraries; compare IDEs like Jupiter and Colab, and learn data loading, manipulation with pandas, and visualization with matplotlib and seaborn.
Learn to set up and use Jupyter and Google Colab for Python data analysis, run code, import pandas, and read CSV datasets with practical, hands-on examples in a notebook environment.
Explore data cleansing and typecasting as core data pre-processing steps, turning unstructured log data into structured, correctly typed inputs for Python while addressing duplicates, missing values, and outliers.
Learn data cleansing and typecasting in Python using pandas, exploring how to read CSV data, inspect dtypes, and convert columns with astype to proper object, integer, or float types.
Identify and consolidate duplicate records across accounts using master data management and data quality concepts, then remove duplicate rows and columns, reducing redundancy even when high correlation exists.
Identify and handle duplicate records in the mtcars dataset with pandas in Python. Explore keep options (first, last, false) and drop_duplicates to retain or remove duplicates.
Unlock the power of textual data with our comprehensive Text Mining course. In today's data-driven world, extracting valuable insights from text has become crucial for businesses and organizations. This course equips you with the skills and techniques needed to effectively analyze, process, and derive meaningful information from textual data sources.
In today's digital age, where data is generated in staggering amounts, the potential insights hidden within textual data have become increasingly significant. Text mining, a discipline that combines data mining and natural language processing (NLP), has become a potent technique for extracting valuable information from unstructured written resources. This comprehensive course delves into the intricacies of text mining, equipping you with a deep understanding of its fundamentals, techniques, and applications.
Text mining, also known as text analytics or text data mining, involves the process of transforming unstructured textual data into structured and actionable insights. As text data proliferates across various domains such as social media, customer reviews, news articles, and research papers, the ability to process and analyze this data has become a critical skill for professionals in fields ranging from business and marketing to healthcare and academia.
One of the first steps in text mining is text preprocessing. Raw text data often contains noise, irrelevant information, and inconsistencies. In this course, you'll learn how to clean and preprocess text using techniques like tokenization, which involves breaking down text into individual words or phrases, and stemming, which reduces words to their base or root form. Additionally, you'll explore methods to remove common stopwords—words that add little semantic value—while considering the nuances of different languages and domains.
A key challenge in text mining lies in representing text in a format that machine learning algorithms can comprehend. This course delves into various text representation methods, including the bag-of-words model and Term Frequency-Inverse Document Frequency (TF-IDF) weighting. These techniques quantify the presence and importance of words within a document or corpus. Moreover, you'll delve into more advanced methods like word embeddings, which capture semantic relationships between words and enable machines to understand context.
Natural Language Processing (NLP) forms the backbone of text mining, and this course introduces you to its essentials. You'll learn about parts-of-speech tagging, which involves identifying the grammatical components of a sentence, and named entity recognition, a process of identifying and classifying entities such as names, dates, and locations within text. Understanding syntactic analysis further enhances your ability to extract grammatical structures and relationships from sentences.
Sentiment analysis, a pivotal application of text mining, enables you to determine the emotional tone or sentiment expressed in text. Businesses can leverage sentiment analysis to gauge customer opinions and make informed decisions, while social media platforms can monitor public sentiments about specific topics or brands. You'll learn how to categorise text as good, negative, or neutral through practical exercises and projects, enabling you to glean priceless information from client testimonials, social media postings, and more.
In the realm of information retrieval, text mining shines as a mechanism to efficiently navigate and extract relevant information from large corpora of text. Techniques like Boolean retrieval, which involves using logical operators to search for specific terms, and TF-IDF ranking, which ranks documents based on term importance, are covered extensively. Moreover, you'll delve into the architecture of search engines, gaining insights into how modern search platforms like Google operate behind the scenes.
The course doesn't stop at theory—it empowers you with hands-on experience using popular text mining tools and libraries. You'll work with NLTK (Natural Language Toolkit), spaCy, scikit-learn, and gensim, among others, gaining proficiency in applying these tools to real-world text mining scenarios. These practical sessions enhance your confidence in implementing the concepts you've learned, ensuring you're well-prepared for actual text mining projects.
This course's main focus is on real-world projects that let you use your newly acquired abilities to solve actual issues. From analyzing customer feedback sentiment for a product to categorizing research articles into relevant topics, you'll work with diverse datasets to solve challenges faced across industries. These projects not only bolster your portfolio but also prepare you to tackle real-world text mining scenarios, enhancing your employability and value as a professional.
It's imperative to consider ethical considerations in text mining. As you extract insights from textual data, you'll encounter privacy concerns, potential biases, and the responsibility to ensure your analysis is fair and unbiased. This course addresses these ethical challenges, emphasizing the importance of maintaining data privacy and being transparent about the methods used in text mining.
Text mining is an evolving field, and staying abreast of its future trends is crucial. The course introduces you to the cutting-edge advancements in the field, including the integration of deep learning techniques for text analysis and the fusion of text data with other data types like images and structured data. By keeping up with these trends, you'll position yourself as a forward-thinking data professional capable of harnessing the latest tools and methodologies.
In conclusion, the Text Mining Fundamentals and Applications course equips you with the skills and knowledge to navigate the world of unstructured text data. From preprocessing and representation to sentiment analysis, information retrieval, and ethical considerations, you'll gain a comprehensive understanding of text mining's intricacies. Real-world projects and hands-on exercises solidify your expertise, making you well-prepared to tackle text mining challenges across industries. Embark on this journey to unlock the wealth of insights hidden within textual data and propel your career forward in the age of data-driven decision-making.