
Learn how Python keywords are reserved words and why you cannot use them as identifiers or variable names, and how identifiers and variables support typing in Python 3.8.10 on Colab.
Discover how Python assigns variables with the equals sign, handles int, float, and string values, and understands memory behavior, id, and type to ensure correct operations.
Explore Python basics by assigning variables, understanding types and memory behavior, and practicing strings and lists with indexing, slicing, mutability, and append operations.
Learn how Python comments improve readability, using hashes for single-line notes and triple quotes for multi-line blocks, with start and end markers or repeated hashes.
Learn to improve readability by breaking long lines with a backslash, printing values, and formatting outputs with curly braces and dot format for A and B.
Explore Python arithmetic and logical operators, including plus, minus, division and multiplication, modulus division, floor division, and exponent, then compare values, apply logical and or not, and learn augmented assignment.
Discover how the identity operator uses is to compare variables in Python, revealing when objects share storage. Explore membership with in and not in for lists.
Explore how for and while loops enable iteration, using range and lists to print sequences and tables efficiently, while emphasizing scalable, minimal-repetition code.
In this module, learn how Python files become modules that store code for reuse, import them with aliases, and selectively import classes to keep programs compact and organized.
Master Python list operations: append vs insert vs extend, and delete methods del, pop, and remove. Learn zero-based indexing, handling duplicates, and how extend differs from append.
Use reverse, in, and not in to access and check list elements; leverage sorted with reverse to view ascending or descending orders. Understand how sort mutates lists and memory references.
Acquire hands-on skills to create and manage sets in Python, including unordered, mutable collections of unique elements; use curly braces or set(), and add or update for multiple values.
Explore strings as immutable, ordered data in Python, and learn indexing, slicing, concatenation, repetition, and use split, join, find, and replace for processing text.
Explore indexing and slicing of arrays, including zero-based starts, negative indices, and reversing; learn that arrays are mutable, support filtering, and use dot copy to create independent copies.
Explore NumPy array operations, including element by element and dot multiplication, and learn shape requirements for matrix multiplication, with examples using 2x3 and 3x2 arrays.
Learn to shape arrays, flatten with ravel, and reshape 1d arrays into 2d or 3d, ensuring element counts match. Master axis-based sorting and argsort for indices without altering the original.
Learn how pandas reads csv into a data frame, inspects data with head and tail, and checks shape and columns.
Master the distinct command in sql by returning unique city and country combinations and ordering by city and country to reveal non-redundant, structured data.
master SQL querying by using where clauses to filter rows, select specific columns, order results, and apply distinct to reveal unique values in product data.
Master the and, or, and null conditions in sql by combining filters with between, greater than, and less than, and handling missing data with is null and is not null.
Explore the like operator and wildcard characters, including percent sign (zero or more characters) and underscore (one character), to find Maria in the customers table for the course.
Explore left, right, inner, and full outer joins, plus self joins, and learn how to connect order details and products with on clauses.
Explore left and right joins to preserve data, selectively select columns with table stars, and join orders with employees to show order details and product names.
Explains inner, left, and full outer joins with practical examples, showing how matches, nulls, and duplicates arise; introduces self joins to pair customers by city.
Learn how to use the in command to filter customers by multiple cities and countries, such as Berlin, Germany, France, and the UK, in a single query.
Learn how the having clause, used after where and after group by, filters aggregated results like country counts, and why having is an elegant, optimized alternative.
Master the union command to append results from two queries, ensuring identical column structure and matching column names.
Master the any and all sql commands with subqueries to filter products by related order details, understand why subqueries return a single column, and manage duplicates.
Explore the definition of a random variable as an unknown value outcome from experiments, with discrete and continuous examples like dice faces and rain indicators.
Explore percentile, range, and quartiles, and learn to identify the median (50th percentile), the lower and upper quartiles (25th and 75th), and the interquartile range.
Explore how standard deviation and variance quantify data spread around the mean, compare population and sample formulas, and apply the coefficient of variation to assess variability.
Explain why sample standard deviation uses n minus one, not n, and how using the sample mean instead of the population mean biases the sample variance.
Explore the normal (gaussian) distribution, its properties, and how to standardize it into a unit normal via z-scores, linking to chi-square distribution.
Explore the chi square distribution, the sum of squares of k independent standard normal variables, with degrees of freedom guiding its shape and table-based probabilities for categorical associations.
Explore how the chi-square distribution tests the association between two categorical variables, comparing observed and expected values, formulating null and alternative hypotheses, and interpreting results with degrees of freedom.
Explore correlation and association between variables using the Pearson coefficient, scatter plots, and covariance concepts, then compare linear and monotonic relationships with Spearman rank.
Explore two-dimensional scatter plots using matplotlib and seaborn, color-coding iris species to reveal separations between Setosa and Versicolor versus Virginica, and learn multiple plotting approaches.
Explore pair plots to compare all four features across every pair, revealing six unique plots and a distribution plot, with petal length and petal width best separating Setosa from others.
Learn how to visualize a feature on the x axis with a 1D scatter plot, separating Setosa, Virginica, and Versicolor by color, and relate to histogram, pdf, and cdf concepts.
Explore histograms and pdfs, where the area under the curve equals one and the density axis indicates probability between intervals.
Explore the Haberman survival dataset through exploratory data analysis with matplotlib and seaborn, inspecting 306 records and four features: age at operation, year, nodes, and survival status.
Explore univariate analysis of age, year, and nodes using histograms and distribution plots to reveal substantial overlap between survival and non-survival, with nodes under four indicating higher survival chances.
Dive into the DonorsChoose Kaggle dataset, learn hands-on data analysis with Python, and predict proposal approvals while honing data storytelling and interpretation.
Explore the Kaggle data by analyzing the train and resource CSVs. Map project IDs to connect resource needs, quantities, and prices with metadata like teacher and state.
Explore univariate analysis by grouping applications by state to compute approval percentages and rank states by acceptance. Also analyze prefixes and grades for their effect on approvals.
Apply univariate analysis to clean and normalize project subcategories into single terms like literacy_language, then count and sort their occurrences with a Counter to reveal literacy_language as the top category.
Learn how linear algebra solves for unknowns in systems of equations, using a bank chase example to connect speeds, head start, and vector and matrix concepts.
Learn how to compute a vector's magnitude using the L2 norm (Euclidean distance) and convert any vector to a unit vector by dividing by its magnitude, illustrated with examples.
Learn how to perform vector addition and subtraction: ensure equal lengths and formats, add element-wise, apply dot product, and extend to n dimensions with x_i + a_i.
Explore how the dot product of vectors works and why it matters in data analysis, data science, and machine learning, with rules for compatibility and scalar results.
Explore distributive properties of vectors and scalars, and learn how the angle between two vectors is defined and measured, including multiple configurations and the smallest angle.
Explore the line equation in vector form, where W^T x = 0 defines a line through the origin and its perpendicular W, using dot products and origin-shifted cases.
Explore projecting a vector onto a line in the plane using magnitude and angle, including projections on axes and arbitrary lines with cos theta and sine theta relationships.
Determine the positive or negative side of a line using the signed distance w^T x / ||w|| and its extension to circles, spheres, and higher dimensions.
Explore how to compute a matrix's inverse to perform division, and master minors, cofactors, and determinants, including 2x2 and 3x3 expansion methods.
Explore dimensionality and why reduction aids visualization. Learn to represent data as column vectors and matrices, with rows as points and columns as features, using X and X transpose.
Compute covariance and variance from data sets and matrices. Understand how to treat x and y vectors, and how column-wise means and a covariance matrix S form the covariance calculation.
Define the data matrix and mean vector, standardize to zero mean and unit variance, then project points onto a unit vector mu to maximize variance.
Formulate PCA as a constrained optimization with the covariance matrix; solve s mu = lambda mu to obtain eigenvalues and eigenvectors, select top components, and project X onto these axes.
Explore a real data set and dimensionality in visualization, connecting Google Colab to Google Drive, access mnist train csv from Kaggle, and read it with pandas.
Discover the Bernoulli distribution, a discrete two-outcome model where one outcome has probability p and the other 1-p, and its use in Bernoulli trials.
Compute the expected value for a Bernoulli distribution by multiplying outcomes 0 and 1 by their probabilities and summing, yielding E = P.
Explore the normal (gaussian) distribution, a bell-shaped, continuous, symmetric curve where the area under the curve equals one, and learn how mean and standard deviation shift it to define probabilities.
Explore how histograms and cumulative distributions connect, showing how the CDF reflects the area under the PDF and how left-right areas relate in normal distributions.
Explore normal distribution with a custom Excel utility to visualize how mu and sigma shape the PDF and CDF, compute them step by step, and illustrate the 68-95-99.7 rule.
Understand the standard normal distribution by transforming any normal distribution to mean zero and standard deviation one, enabling use of a z table by subtracting mu and dividing by sigma.
Learn how the z score standardizes any normal distribution to a unit normal, using (x−μ)/σ, and how the area under the pdf relates to probabilities between x and y.
Master how to read a z score table, interpret positive and negative z values, and use left and right area concepts under the normal distribution to solve problems.
Analyze the normal distribution with mean 16.3 and sd 0.2 by calculating z scores for pizza sizes, then find right-tail probability above 16.5 and the interval probability using the z-table.
Apply z-score calculations to a normal distribution with mean 70 and sd 5, read the z-table to estimate probabilities for x<65, x>75, and 65≤x≤75, then convert to counts.
Use z-score methods for a normal distribution to determine mu and sigma from P(X<30)=0.15 and P(X>50)=0.10 in a battery lifespan example.
Central limit theorem states that, for any population, sampling with over 30 yields sample-means distribution centered at population mean, with standard deviation equal to population standard deviation divided by sqrt(n).
Explore central limit theorem: from any population with 30 samples, distribution of sample means is normal with mean mu and sigma over sqrt(n); if population is normal, any size suffices.
Apply the central limit theorem to find the probability that the sample mean of 49 shoppers, with mu 448 and sigma 21, lies between 441 and 446, yielding about 24.5%.
Explore discrete and continuous uniform distributions and their relation to normal distribution. See discrete uniform's equal-probability finite outcomes, like a dice, and continuous uniform's height making area one.
Explore how lognormal distributions arise in everyday data, with examples from online comments, dwell time, game durations, tissue sizes, surgery times, income, citations, file sizes, and traffic.
Explore the quantile-quantile (q-q) plot to compare an unknown distribution with a known one, especially against normal and log normal distributions, by sorting data and assessing linearity.
Learn hypothesis testing by combining normal distribution and the central limit theorem, clarifying null and alternate hypotheses, alpha, and p value through experiments and data samples.
Construct a 90% confidence interval for a mean using a random sample, central limit theorem, and z-scores, illustrated with a US-India trade example.
Understand why z score and z table are limited when population standard deviation is unknown, and how the t table uses degrees of freedom and the sample standard deviation.
Conduct a one-tailed t-test for mu = 82 vs mu > 82 with n = 25. 85 and s = 4.1 give t = 3.65, p ≈ 0.005, reject null.
Explore how alpha and p-values guide hypothesis testing, defining null and alternate hypotheses, and interpreting evidence levels from p values with correct rejection and not rejecting the null.
THE COMPREHENSIVE DATA ANALYST COURSE IS SET UP TO MAKE LEARNING FUN AND EASY
This 100+ lesson course includes 20+ hours of high-quality video and text explanations of everything from Linear Algebra, Probability, Statistics, Permutation and Combination. Topic is organized into the following sections:
Python Basics, Data Structures - List, Tuple, Set, Dictionary, Strings
Pandas and Numpy.
Linear Algebra - Understanding what is a point and equation of a line.
What is a Vector and Vector operations
What is a Matrix and Matrix operations
Data Type - Random variable, discrete, continuous, categorical, numerical, nominal, ordinal, qualitative and quantitative data types
Visualizing data, including bar graphs, pie charts, histograms, and box plots
Analyzing data, including mean, median, and mode, IQR and box-and-whisker plots
Data distributions, including standard deviation, variance, coefficient of variation, Covariance and Normal distributions and z-scores.
Different types of distributions - Uniform, Log Normal, Pareto, Normal, Binomial, Bernoulli
Chi Square distribution and Goodness of Fit
Central Limit Theorem
Hypothesis Testing
Probability, including union vs. intersection and independent and dependent events and Bayes' theorem, Total Law of Probability
Hypothesis testing, including inferential statistics, significance levels, test statistics, and p-values.
Permutation with examples
Combination with examples
Expected Value
Donors Choose case study.
AND HERE'S WHAT YOU GET INSIDE OF EVERY SECTION:
We will start with basics and understand the intuition behind each topic.
Video lecture explaining the concept with many real-life examples so that the concept is drilled in.
Walkthrough of worked out examples to see different ways of asking question and solving them.
Logically connected concepts which slowly builds up.
Enroll today! Can't wait to see you guys on the other side and go through this carefully crafted course which will be fun and easy.
YOU'LL ALSO GET:
Lifetime access to the course
Friendly support in the Q&A section
Udemy Certificate of Completion available for download
30-day money back guarantee