
Embark on a practical, beginner-friendly data science journey without intimidating mathematics. Learn about cutting-edge tools and methods quickly, with Q&A support today and opportunities to provide feedback.
Master data science in Python through a project-driven course using Google Colab, notebooks, and data sets. Get practical success tips, best practices, and steps to earn your certificate.
Discover how to build data science projects quickly using the minimum effective dose, with pandas for data wrangling, feature engineering, regression, classification, and visualization with seaborn and matplotlib.
Discover what data science is and how data-driven insights boost profits and reduce costs across industries. Explore data sources, data preparation, and what recruiters look for in data science roles.
Uncover the evolving data scientist profile—from education and experience to salary and technology—through Kaggle’s survey highlights, including popular tools like Jupyter notebooks, the Psychic Learn Library, and Amazon SageMaker.
Data scientists infer insights from data using statistics, programming, and domain knowledge; they wrangle data, visualize with dashboards, set KPIs, choose and train models, and tell stories to stakeholders.
Learn the nine key skills recruiters seek in data science applicants, from building a portfolio with data wrangling in pandas and data visualization to machine learning, statistics, and strong communication.
Discover current data science job opportunities in the US, from junior to senior roles, and the required skills like Python, statistics, and data visualization.
Explore exploratory data analysis with pandas to gain insights from real-world HR data, covering statistics, feature engineering, one-hot encoding, normalization, missing data, and data type changes, with a heat map.
Master data wrangling and exploratory data analysis with Pandas to transform messy real-world data, handle missing values, one-hot encode features, and assess correlations via heatmaps.
Import datasets and perform basic statistical analysis with pandas, exploring a real hr dataset, displaying all rows and columns, and using describe and info for summary.
Practice calculating the mean, max, and min of the age column in the HR dataframe with pandas, selecting the column and using dot methods.
Identify missing values in a pandas dataframe with isnull, drop rows containing nulls, and fill missing monthly income with the mean.
Compute the median monthly rate from the HR data and fill missing values with it. Read the raw data again and verify updates by checking nulls.
Learn how to perform one-hot encoding to convert a categorical column into binary numerical features with pandas get_dummies, enabling reliable machine learning training.
Complete optional practice three by cleaning the HR data, removing missing values, counting unique education field categories, and applying one-hot encoding with get_dummies; next, scaling, normalization, and standardization.
Learn how feature scaling improves machine learning by applying normalization and standardization. Compare min-max scaling and z-score methods to align features like rates, employment, and stock prices.
Apply feature scaling by normalizing the age column to a 0–1 range using min-max scaling with fit_transform. The next task covers standardization.
Practice opportunity four guides feature scaling with standardization using scikit-learn's StandardScaler, applying fit_transform and reshaping, then verifying a zero mean and unit standard deviation.
Learn pandas filtering and masking to extract rows from a dataframe by criteria such as years at company over 30, department of research and development, and daily rate 800–850.
Filter the hr_df dataframe for daily rates at least 1450 with pandas, then sum these rates to reveal total daily cost and preview basic eda on stayed and left employees.
Perform basic exploratory data analysis on a two-class employee dataset using pandas, splitting into left and stayed dataframes and applying describe to compare attrition groups.
Filter the data frame to include employees whose attrition is 'no', describe key metrics, then compare age, daily rate, distance from home, and environment satisfaction to infer insights.
Define a function in Python and apply it to the daily rate column of a pandas dataframe to increase values by 10% and return updated results.
Define a function to double the daily rate and add 100, apply it to a pandas dataframe, and compute the updated total daily rate for all employees.
Plot histograms and correlations from a pandas dataframe, exploring age, income, and other features. Learn to generate a correlation matrix heatmap with matplotlib and seaborn, with annotations.
Exercise eight guides you to customize a heat map by changing the color map with Seaborn and Matplotlib, exploring options and applying cmap settings for data visualization.
Perform basic exploratory data analysis on the kyphosis dataset with pandas, including statistical summaries, a correlation matrix, age conversion, and standardization versus normalization.
Load and explore the kyphosis dataset with pandas and seaborn, convert ages to years, and compute descriptive stats and correlations. Apply min-max scaling and standardization.
Celebrate completing today's data science in Python lesson, stay motivated, rest, and prepare for tomorrow's brand new challenge.
Demonstrate cryptocurrency price visualization and daily returns with Seaborn and map plotting in Colab and Jupyter notebooks, including plots, pie charts, subplots, pair plots, count plots, and heat maps.
Analyze cryptocurrency and stock prices with map plot lib and Seabourn to visualize data and daily returns. Explore line plots, heatmaps, histograms, scatterplots, and other key visualizations.
Learn to tell data stories with visualization using scatterplots and bubble charts for relationships, bar charts for comparisons, and histograms, box plots, pies, and stacked charts for distributions and compositions.
Plot pie charts with matplotlib in a Jupyter notebook by building a pandas dataframe of crypto allocations, guided by practice opportunities and solution reviews.
Allocate 60% to Ripple (XRP) and evenly divide the rest among Bitcoin, Litecoin, Ada, and Ethereum; plot the pie chart and use explode to separate Ripple from the others.
Plot single and multiple line plots of crypto prices from csv data using pandas and map plot lib, visualizing Bitcoin, Ethereum, and Ada over time.
Plot Bitcoin, Ethereum, and Ada using subplots for separate crypto prices. Configure investments dataframe, plot with date on the x-axis, set a crypto prices title, and apply a fixed size.
Plot a scatterplot of Bitcoin vs. Ethereum daily returns with grid, 12 by 7 size, hot pink points at 0.5 alpha using matplotlib. It shows a strong positive correlation.
practice option three guides you to plot daily returns of bitcoin versus ada as a scatterplot with grid, fixed size 12 by seven, blue color, and alpha transparency.
Plot histograms of crypto daily returns using pandas and matplotlib, compute mean and standard deviation, and display blue histogram with 40 bins and a title showing the mean and sigma.
Explore practice opportunity four by plotting histograms for Bitcoin and Ethereum with 60 bins on a single figure, using alpha for overlap, and compute their means and standard deviations.
Learn to plot and classify breast cancer data with seaborn, using scatter plots of mean area vs mean smoothness colored by target, and count plots to show class balance.
Plot a scatterplot of mean radius versus mean area from the cancer_def data and comment on the relationship, noting that target class zero tends to have larger values.
Learn to use seaborn to create pair plots, heatmaps, and distribution plots that reveal correlations among mean radius, mean area, and mean texture, and compare class zero and class one.
Separate the dataset into class zero and class one using the target column, then plot their mean radius distributions on the same dist plot in blue and red.
Visualize stock prices for JPMorgan Chase, Procter & Gamble, Apple, and United Airlines with seaborn and matplotlib; build line plots, heatmaps, histograms, and a 3d returns view.
Explains reading stock data with pandas and visualizing with matplotlib and seaborn, using line plots, subplots, scatter plots, pie charts, histograms, correlations, and a 3d plot for capstone project solution.
Celebrate completing today's activity. Preview an exciting new project with brand new tools and invite learners to stay tuned for tomorrow.
Build your first regression model with the Cycle Learn library, exploring regression fundamentals, data splitting, training and testing, and evaluating model performance on university admission data.
Explore regression analysis and predict university admission chances using the EG boost algorithm, training in Google Colab with simple code and inputs like GPA, total score, and recommendations.
Learn how regression estimates a continuous output from multiple input features, using house price prediction with bedrooms, bathrooms, square footage, and waterfront status.
discover how extreme gradient boosting, or xgboost, builds a series of models to correct previous errors for regression and classification, delivering fast, memory-efficient, robust predictions with tunable hyperparameters.
Import key libraries, upload and load the university admission and life expectancy datasets, and preview columns, shapes, and data types to set up regression for predicting admission chances.
Compute the highest, average, and lowest total score using Pandas and NumPy, verify data with shape and missing values, and preview exploratory data analysis and visualization.
Perform exploratory data analysis and visualization on a university admission dataset, using heatmaps for missing values and scatter plots to relate total score, e score, and gpa to admission chances.
Plot the correlation matrix and comment on it using an annotated heatmap. Observe strong positive correlations, such as between chances of admission and scores (around 0.81 to 0.88).
Prepare data for model training by separating inputs and the target. Convert to numpy arrays and reshape for the egg boost algorithm with seven features from 1000 samples.
Split data 25% testing and 75% training with circuit learn library's train_test_split, verify shapes for sets, and explain why the scaling code is commented out for the X Boost algorithm.
Train an XGBoost regression model and evaluate it on a 75/25 train-test split. Tune learning rate, max depth, and estimators, then assess performance with RMSE, MSE, MAE, and R-squared.
Practice retraining a boosted tree model with lower max depth to see how depth affects coefficient of determination, with R squared dropping from ~95% to ~85%, and explore hyperparameter tuning.
Train an egg boost regression model to predict life expectancy using World Health Organization and United Nations data, performing feature engineering, visualizations, an 80/20 train-test split, and evaluation with R-squared.
Learn to build and evaluate a capstone project solution for life expectancy prediction. Encode status, handle missing values, train a boosted regression model, and assess MSE, RMSE, MAE, and R-squared.
Conclude day three by training regression models in cycle learn. Share your feedback and suggestions to improve future lessons in one week of data science in Python.
Train classifiers such as logistic regression, SVM, k-nearest neighbors, and random forest to predict telecom churn using tenure, gender, payment method, and evaluate with accuracy, precision, recall, and F1 score.
Predict telecom customer churn using classification algorithms such as logistic regression, adaboost, SVM, k-nearest neighbors, and random forest; compare models and assess performance with inputs like tenure and payment method.
Evaluate binary classification models using confusion matrices and key performance indicators. Compute accuracy, precision, recall, and ROC curve and AUC to compare logistic regression, SVM, nearest neighbors, and random forest.
Import essential libraries and the telecom churn dataset to perform exploratory data analysis and build binary classification models to predict churn from 5,000 samples across 21 features.
Practice opportunity shows how to use pandas describe on telecom_df to find average daily minutes (180) and maximum (351) after performing basic EDA.
Visualize telecom data with histograms, pie charts, and a heatmap correlation matrix. Compare churn and stay with kernel density estimates of total daily charge and daily minutes.
Practice opportunity two guides students to visualize total evening charges with a kernel density estimate plot, comparing retained and churn customers using seaborn KDE.
Identify and plot feature importance from a random forest classifier on telecom data after cleaning features, then train and evaluate with a 70/30 split to reveal top churn predictors.
Learn the fundamentals of logistic regression for binary classification, compare it with linear regression, apply a sigmoid-based model, and evaluate performance with classification report and confusion matrix.
Print and comment on the confusion matrix for the logistic regression model with independent data, showing 1300 true zeros, 18 true ones, about 180 misclassifications, and 25 wrongly classified samples.
Train and evaluate a support vector machine classifier, using calibrated linear SVC, and assess performance with a classification report, confusion matrix, and metrics like precision, recall, and F1 score.
Train and evaluate a random forest classifier, an ensemble of decision trees that votes to improve accuracy and reduce overfitting, with practical guidance on reports and confusion matrices.
Train and evaluate a k-nearest neighbors classifier, compare its accuracy to the random forest model using euclidean distance and a majority vote among neighbors.
Practice calculating the euclidean distance between a(1,3) and b(2,3) and review training a k-nearest neighbor classifier, including fit, predict, and evaluation via classification report and confusion matrix.
train and evaluate a naive Bayes classifier using Bayes theorem with priors, likelihoods, and posterior probabilities, featuring a two-feature bank retirement eligibility example and model performance insights.
Compute the probability of no retiring given the features by combining the prior probability, the likelihood, and the marginal likelihood to classify a new data point.
Plot receiver operating characteristic curves for five models—logistic regression, SVM, random forest, kNN, and naive Bayes—and compare AUC, with random forest achieving about 0.91.
Apply the random forest model to test data and print the classification report. Illustrate ROC curves and AUC, report precision, recall, and accuracy, and mention grid search for hyperparameter tuning.
Predict credit card default for a Taiwan bank dataset using EDA, data preparation, and multiple classifiers (SVM, Naive Bayes, logistic, random forest, KNN) evaluated by AUC.
prepare the UCI credit card data for model training and compare classifiers—support vector machines, naive bayes, logistic regression, random forest, and k-nearest neighbors—using roc-auc to identify the best model.
Encode categorical features with one-hot encoding and scale numerical features with MinMaxScaler. Split data into train and test, train XGBoost, logistic regression, and SVC, and report accuracy and confusion metrics.
Train and compare random forest, K nearest neighbor, and Gaussian Naive Bayes classifiers on the final capstone dataset, evaluating with accuracy, recall, confusion matrices, and ROC AUC curves.
Celebrate your progress as you understand the fundamentals of classified models in cyclones and apply that knowledge to today’s project, with a look forward to tomorrow.
Predict health care insurance costs from age, gender, bmi, children, and smoking status using auto gluon for rapid regression and classification modeling.
Learn data science on autopilot with AutoGluon to prototype regression and classification models predicting health insurance charges from age, gender, BMI, number of children, smoking, and region.
Explore AutoGluon, an open-source library behind AWS SageMaker Autopilot, to train regression and classification models on tabular, text, and image data using simple presets and a leaderboard.
Import AutoGluon and key libraries, load the insurance dataset, and prepare the notebook for analysis. Practice initial data loading, basic visualization, and model training with AutoGluon.
Practice identifying unique regions in the insurance_df by extracting the region column and applying dot unique to reveal four regions: Southwest, Southeast, Northwest, and Northeast.
Perform exploratory data analysis (EDA) with pandas to inspect data using head, tail, describe, and info, then group by region to compare average charges, noting southeast is highest.
Group data by age and examine the relationship between age and charges using a pandas data frame. Observe higher charges for older ages and prepare for regression modeling.
Visualize the insurance data with seaborn and matplotlib, checking for missing values and plotting histograms. Explore relationships using pair plots, heatmaps, and regplots for age, bmi, smoker, and charges.
Calculate and plot the correlation matrix to identify the feature with the strongest positive correlation to insurance charges, using a heatmap. Note that age shows the strongest correlation with charges.
Train regression models on autopilot with auto gluon. Split data, specify the target charges, and compare models using R-squared to identify the best weighted ensemble.
Assess trained regression models with leaderboard, bar plots, and hold-out testing, compare predictions to ground truth, and report near 0.9 r-squared performance.
Retrain a regressive model with AutoGluon using deployment-optimized presets, set 300-second limit, evaluate with MSI, plot the leaderboard, and identify weighted ensemble L2 as top performer.
Train a diabetes classifier with AutoGluon using features like glucose, BMI, insulin, age, and more to predict diabetes status. Evaluate models with a leaderboard and confusion matrix for clear results.
Demonstrates a capstone solution using auto gluon to train binary classifiers on diabetes data, perform exploratory data analysis and visualization, and evaluate models with a leaderboard and accuracy metrics.
Learn to leverage the auto glue and library to prototype various regression and classification models and deploy the best one.
Build, train, and test a bike rental predictor using temperature, humidity, and wind speed. Compare hyperparameter optimization strategies—grid search, randomized search, bayesian optimization—and explore bias-variance tradeoff and l1/l2 regularization.
Build, train, and test regression model to predict bike rental usage from weather, season, year, and other inputs; optimize its parameters and hyperparameters using the mentioned machine learning library.
Explore how hyperparameters drive model training and optimize learning rate to minimize the objective toward the global minimum, using regression on bike rental data with temperature, wind speed, and humidity.
Explore grid search, randomized search, and Bayesian optimization to tune hyperparameters like learning rate, max depth, and number of estimators, including cross-validated exhaustive and selective searches.
import and explore datasets by loading pandas and numpy, reading bike sharing data from csv, inspecting features like season, weather, temperature, and date, and preparing overall count for modeling.
Practice the first optional exercise by comparing the average casual, registered, and total bike sharing demand with pandas describe, and verify if casual plus registered mean equals the overall average.
Perform data cleaning by checking missing values with a heatmap, dropping instant and casual and registered features, and converting date to a datetime index to prepare total count for visualization.
Visualize weekly and monthly bike rental demand from a DateTime index, using scatter plots and a correlation heatmap to show temperature-count links and seasonal trends.
Plot the rental usage per quarter in Python, set the line width to six, and enable the grid to visualize demand trends across 2011–2012.
Split the data into 80% training and 20% testing, apply a one hot encoder to categorical features, concatenate with numerical features, and prepare the dataset for model training.
Train an XGBoost regression model on the split data, fit it on the training set, and evaluate with R-squared, MSE, and MAE while generating predictions on the test set.
Retrain the boost algorithm at varying max depths, compare performance, and note that simpler trees improve r-squared while reducing overfitting; explore hyperparameter optimization via grid search.
Apply grid search with GridSearchCV to optimize hyperparameters—max depth, learning rate, and estimators—for an egg boost regression model, achieving r squared improvement from 0.73 to about 0.9.
Add gamma to the grid and run a five-fold cross-validated search over 162 candidates; monitor root mean squared error around 614 and coefficient of determination near 0.9.
Optimize hyperparameters with random search instead of exhaustive grid search, sampling 50 iterations with 5-fold cross-validation and evaluating RMSE, MSE, MAE, and R squared.
Explore bayesian optimization to fine-tune gradient boosting hyperparameters, including learning rate, max depth, and estimators, using bayesian search with cross-validation for improved model performance.
Load the used car prices dataset, split 25% for training and testing, train a boosting model, and compare grid search, random search, and bayesian optimization using MSI and R squared.
Predict MSRP for used cars with a regression model using one-hot encoding, train/test split, and XGBoost, optimizing with grid, random, and Bayesian search to achieve R-squared around 0.92 and RMSE near 5000.
Celebrate mastering hyper parameters tuning in psychic learn, and enjoy today’s project and future lessons.
Build a food recognition classifier from images with the data robot tool, classifying dessert, seafood, fried food, and vegetable and fruit, and explore explainable artificial intelligence.
Sign up for data robot, upload food image dataset, train a classifier for fried food, seafood, vegetables and fruits, and dessert; explore data, set target as class, deploy best model.
Train and compare six models using data robot, highlight the residual neural network classifier and logistic regression with L1/L2 regularization, and explore model evaluation with confusion matrices and tuning options.
Explore explainable AI with Grad-CAM visualizations and activation maps, see how Data Robot trains, selects MobileNet, and reveals model reasoning for image classifications.
Understand simple linear regression between temperature (X) and revenue (Y), fit a line with slope m and intercept b to predict Y from X, and use it for revenue predictions.
Learn how least squares finds the best fit line in simple linear regression by minimizing squared residuals to estimate the slope and intercept on training data.
Explore psychic learn (scikit-learn), a free Python machine learning library, and master regression, classification, and clustering with preprocessing, train-test splits, and a simple linear regression example.
Explore the XG-boost algorithm deep dive: build an ensemble of models that learn from residuals with a learning rate to improve predictions.
Explore XGBoost deep dive: learn gradient-boosted trees, ensemble methods, and supervised learning for regression and classification, with hyperparameter tuning, distributed training, and robustness to missing data and outliers.
Learn how boosting builds an ensemble of weak models that learn from residuals in a sequential process. See a practical example and how reweighting data improves predictions with egg boost.
Explore ensemble learning, including bagging and boosting, to combine multiple models and vote for better predictions. Learn how decision trees form ensembles, reduce overfitting, and improve generalization.
Explore the bias variance trade off, balancing model complexity to minimize both bias and variance. Learn how training and testing performance reveal generalization, guiding hyperparameter tuning toward the sweet spot.
Explore l2 regularization and ridge regression to balance bias and variance, reducing overfitting and improving generalization between training and testing data. See how alpha tunes the penalty and slope.
Explore L1 regularization (lasso) and compare it to L2, introducing an alpha penalty on the absolute slope to reduce overfitting and enable feature selection.
Do you want to learn Data Science and build robust applications Quickly and Efficiently?
Are you an absolute beginner who wants to break into Data Science and look for a course that includes all the basics you need?
Are you a busy aspiring entrepreneur who wants to maximize business revenues and reduce costs with Data Science but lacks the time to do so quickly and efficiently?
This course is for you if the answer is yes to any of these questions!
Data Science is one of the hottest tech fields to be in now!
The field is exploding with opportunities and career prospects.
Data Science is widely adopted in many sectors, such as banking, healthcare, transportation, and technology.
In business, Data Science is applied to optimize business processes, maximize revenue, and reduce cost.
This course aims to provide you with knowledge of critical aspects of data science in one week and in a practical, easy, quick, and efficient way.
This course is unique and exceptional in many ways. It includes several practice opportunities, quizzes, and final capstone projects.
Every day, we will spend 1-2 hours together and master a data science topic.
First, we will start with the Data Science essential starter pack and master key Data Science Concepts, including the Data Science project lifecycle, what recruiters look for, and what jobs are available.
Next, we will understand exploratory data analysis and visualization techniques using Pandas, matplotlib, and Seaborn libraries.
In the following section, we will learn about regression fundamentals. We will learn how to build, train, test, and deploy regression models using the Scikit Learn library.
In the following section, we will learn about hyperparameter optimization strategies such as grid search, randomized search, and Bayesian optimization.
Next, we will learn how to train several classification algorithms such as Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Random Forest Classifier, and Naïve Bayes in SageMaker and SK-Learn libraries.
Next, we will cover Data Science on Autopilot! We will learn how to use the AutoGluon library for prototyping multiple AI/ML models and deploying the best one.
So who is this course for?
The course targets anyone wanting to gain a fundamental understanding of Data Science and solve practical, real-world business problems.
In this course:
You will have an actual practical project-based learning experience. We will build over ten projects together
You will have access to all the codes and slides
You will get a certificate of completion that you can post on your LinkedIn profile to showcase your skills in Data Science to employers.
All this comes with a 30-day money-back guarantee, so you can give a course a try risk-free!
Check out the preview videos and the outline to get an idea of the projects we will cover.
Enroll today, and let’s harness the power of Data Science together!