
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Leverage ChatGPT to accelerate data science and machine learning projects with prompting strategies and practical model selection. Master fitting, evaluation, and visualization using scikit-learn, matplotlib, seaborn, statsmodels, and Python pandas.
Explore how to analyze Titanic survival factors with ChatGPT by using a pandas workflow in Python, from loading data to calculating survival rates by gender, passenger class, and age group.
Apply seven tips to maximize learning in this course, including checking the overview and prerequisites, exploring course content, using the ai assistant, and practicing with downloads and exercises.
Explore a complete data science case study from pandas data wrangling to explanatory analysis, regression, classification, and clustering, powered by ChatGPT and Python.
Download all course materials such as data sets and Jupyter notebooks via the resources tab, then unzip the course materials zip to access section four intro project and future sections.
Explore how ChatGPT and GPT models work. Learn how they are trained on vast web data, use pattern recognition, generate text, browse the web, and read and write code.
Compare ChatGPT and search engines, highlighting direct, conversational answers versus web links; emphasize keywords, browsing, and future integration of AI models with search tools for a unified query experience.
Compare artificial intelligence like ChatGPT to human intelligence, focusing on similarities and differences. Highlight non-determinism, data scale, speed, and private information considerations in finance and investing.
Create or log in to ChatGPT on the OpenAI site, sign up with email or Google/Microsoft/Apple ID, enable two-factor authentication, and start with GPT-3.5 in the chat interface.
Summarizes July 2024 updates to ChatGPT, detailing latest models, web app changes, and free versus plus access, including data analysis, file uploads, vision, web browsing, DALL-E, and custom GPTs.
Explore the differences between ChatGPT and GPT models, including GPT-3.5 and GPT-4, and learn how ChatGPT Plus, plugins, web browsing, and the API expand functionality.
Explore the OpenAI website to discover ChatGPT and other GPT-based models like DALL·E, learn about safety, data privacy, and API pricing, and see how tokens power GPT-4 and fine-tuning.
Explore how tokens convert text into numbers for GPT models, and how token limits, tokenization, and token IDs—along with pricing—shape outputs.
Learn how explicit instruction in prompts improves ChatGPT responses, with guidance on clarity, specificity, relevance, and conciseness, illustrated via a data science project prompt using the Titanic dataset.
Explore iterative refinement as a prompting technique for data science tasks, providing feedback, clarifications, and follow-up questions to produce detailed Python code for data preprocessing and exploratory data analysis.
Explore how target audience, detail level, and response format shape prompts, and see practical distinctions between supervised and unsupervised learning with audience-tailored explanations and formats like bullet points and tables.
Install the Python data science ecosystem with Anaconda to manage dependencies and work with Jupyter notebooks, PyCharm, or Spyder; download the graphical installer for Windows, Mac, or Linux.
Open the Anaconda Navigator, launch Jupyter Notebook, and explore the base environment with numpy and pandas; run cells with shift enter or alt enter, rename notebooks, and view conda list.
Explore the features of Jupyter notebook as a Python environment, including code and markdown cells, edit and command modes, run cells with shortcuts, and kernel controls to manage memory.
Explore an unknown appointments data set with pandas and ChatGPT. Load CSV, inspect structure, brainstorm analysis ideas, perform cleaning and feature engineering, and compare GPT-3.5 with GPT-4.
Explore GPT upgrades' effects on prompting strategies in python data science, noting higher token limits and faster responses, with GPT-4 row able to execute code using numpy, pandas, and matplotlib.
Load and inspect appointments dataset from Vitoria, Brazil, identify A/B testing ideas, perform cleaning and feature transformation, and deliver cleaned dataset with 1–2 project ideas using GPT as your assistant.
Learn to provide a CSV dataset to ChatGPT, comparing GPT-4o file upload with GPT-3.5 copy-paste limits, and sample tokens using a tokenizer for effective data inspection.
Inspect a medical appointments dataset with pandas and GPT-powered analysis to distinguish observations from properties, and identify columns such as patient id, gender, scheduled day, age, neighborhood, and no-show status.
Brainstorm with chatgpt to plan exploratory data analysis and machine learning on an appointments dataset, exploring no-show prediction, patient segmentation, time series, and data cleaning steps.
Learn to perform data cleaning in Python with pandas, including handling missing values, removing duplicates, and capping outliers, then prepare for feature engineering and selection in a no-show prediction dataset.
Engineer features and transform data by creating a time gap feature, encoding categoricals, dropping unused ids, and extracting weekday, month, and year from date columns.
Upload the appointments.csv with GPT-4, read it with pandas, inspect the columns, and run code in a Jupyter notebook for interactive data analysis.
Learn to perform an initial data inspection of a medical appointments dataset with pandas, including info and describe, while exploring no-show analysis and GPT-4 driven brainstorming.
Learn data cleaning with GPT-4 using pandas to fix missing and incorrect values, convert dates to datetime, remove duplicates, and address age outliers for reliable analysis.
Engineer features and transform data with GPT-4 by creating new features, encoding categoricals, and extracting date components. Examine the final data frame and code execution notes.
Save the cleaned appointments dataset to a local CSV, download it via the link, and verify the file named cleaned appointments with one-hot encoded features and neighborhood dummies.
Compare GPT-3.5 and GPT-4’s pros, cons, and costs, including file upload and code execution. Emphasize critical evaluation and troubleshooting in data cleaning with ChatGPT.
Learn advanced data wrangling with pandas using ChatGPT, including indexing, sorting, filtering, type conversions, and grouping. Compare GPT 3.5 and GPT four for robust, efficient code and sanity checks.
Solve a multi-task data wrangling project by loading the Brazilian healthcare appointments dataset, applying Python and pandas to 11 tasks in a Jupyter notebook, and explaining code to clients.
Load the appointments data from CSV into a pandas DataFrame named df, parse appointment day and scheduled day as datetime, and sort by appointment day then scheduled day.
Cast the patient ID from float64 to strings without decimals to avoid notation. Prevent overflow by converting floats to ints then strings, or using a lambda with apply in pandas.
Learn to convert the no show column to a numeric 0/1 feature in pandas using int64 mapping, then compute the average no show rate with mean, with Python code examples.
Reverse one-hot encoding of the neighborhood feature in a pandas workflow by aggregating dummy columns into a single neighborhood column, then drop the dummies and validate with value counts.
rename the neighborhood name column in a dataframe, save changes in place, and write the updated data to a new csv while starting a fresh GPT-3.5 session.
select the needed columns for the appointments dataset in a specific sequence, removing unused columns and using pandas indexing to update the data frame.
determine the number of unique patients and extract the patient with the highest number of appointments in a medical appointments dataset using pandas, including counts and top-patient output.
Aggregate the appointments dataset into a patient-level dataframe with one row per unique patient. Compute mean age and mean time gap, summarize hypertension, and derive no-show rate by grouping.
Filter the patient data set to find patients with ten or more appointments and a no show rate of 100% using pandas with two conditions.
Create a previous no show feature in an appointments data frame using pandas, summing no show events per patient while excluding the current appointment and handling missing values.
Identify why negative time gap values appear between scheduled and appointment days, then fix them by converting to date-only and recalculating the time gap with pandas, with ChatGPT guidance.
Remove special characters from all column headers by renaming them with a function and regular expressions, fixing the no show column index and enabling reliable access.
Learn to identify and avoid the setting with copy warning by creating an explicit copy when selecting a subset of a data frame, with practical code tips.
Learn data wrangling and manipulation with GPT-4 and GPT-4o in Python data science, uploading the full dataset, executing 11 tasks with Pandas, and exporting a cleaned csv ready for download.
Explore explanatory data analysis with ChatGPT by loading the appointments CSV, creating features, and performing univariate, bivariate, and multivariate analyses to explain no-show rates.
Load the appointments data set and conduct an EDA focused on no-show rates, performing univariate and bivariate analyses, creating features, and delivering a self-explanatory Jupyter notebook.
Load and validate the appointments data set in Python with pandas, convert date-time columns, and perform initial data overview of 17 columns to understand structure and confirm no missing values.
Brainstorm goals and outcomes for exploratory data analysis of the appointments dataset, focusing on demographics, sms reminders, timing, health conditions, and factors influencing show-ups and no-shows to guide data-driven recommendations.
Explore feature engineering for the appointments data by creating age categories, waiting time categories, total conditions, and day-of-week flags to improve predictive modeling.
Perform univariate data analysis to understand single-feature distributions, using histograms or box plots for numerical data and bar charts for categorical data, with examples like age and no shows.
The lecture demonstrates building and visualizing a correlation matrix with a heat map in a multivariate data analysis, discusses errors, kernel restarts, and essential data inspection steps.
Explore multivariate data analysis with Python by building and interpreting a correlation matrix and heatmap to uncover relationships among features, including no show, sms reminders, age, and appointment timing.
analyze factors driving appointment no shows using numerical and categorical features; longer waiting times and prior no shows increase risk, while scholarships reduce it, with little gender or weekday effect.
Analyze how categorical features influence appointment no-shows by visualizing no-show rates for gender, scholarship, sms reminders, weekday, age, and waiting time.
Analyze how sms reminders relate to no-show risk, focusing on waiting time, prior no-shows, age, and gender. Waiting time emerges as the key driver, guiding optimization of reminders.
Learn end-to-end data preparation and exploratory analysis in Python: load data with pandas, engineer age and waiting time categories, and visualize correlations and no-show patterns.
Group appointments by neighborhood to compute number of appointments and mean no-show rate; filter neighborhoods with at least 1000 appointments, then sort by no-show rate and plot a bar chart.
Identify missing and relevant features that could explain no-show rates in the appointments dataset, and propose data sources and prompts to enhance predictive insights while preserving privacy.
Load data, encode categorical features, and apply logistic regression in statsmodels for multivariate analysis of no-show rate, with hypothesis testing and ChatGPT-assisted interpretation.
Load and pre-process the appointments dataset, perform multivariate analysis with hypothesis testing, and identify significant factors (p≤0.05) that influence no-show rates; deliver a Jupyter notebook with executable code and interpretation.
Feed the column headers and a subset of the appointments dataset into GPT 3.5, load the CSV with pandas, and explain columns for advanced statistical analysis.
Explore multivariate data analysis and multicollinearity to test feature significance for appointment no-shows using logistic regression, with data processing, hypothesis testing, and p-values (statsmodels, pandas) discussed, code-free for now.
Develop a multivariate data analysis with logistic regression and hypothesis testing by preparing data: feature selection, one-hot encoding of categorical variables, handling multicollinearity, and adding a constant in Statsmodels.
Learn to fit a logistic regression model with statsmodels, including encoding categorical variables, preparing X and y from no_show, adding an intercept, and interpreting the results summary.
Analyzes the logistic regression results from Statsmodels, highlights missing values and multicollinearity as issues, notes significant features by p-values, and proposes removing total conditions to stabilize the model.
Drop the total conditions to fix multicollinearity and re-fit the logistic regression, yielding no missing coefficients and a pseudo R-squared of 0.12.
Use logistic regression to interpret appointment no-shows, noting significant features by p values under 0.05, such as sms reminders reducing no-shows and same-day appointments lowering risk.
Explore why bivariate correlations and logistic regression coefficients differ when assessing SMS reminders and appointment no-shows. Explain potential causes, including multicollinearity, interactions, and confounding factors, with practical interpretation.
Explore a ChatGPT guided project to predict no shows with a binary classification using appointments.csv, including data preprocessing, model selection, baseline evaluation, handling class imbalance, and feature importance.
Convert categorical features to category dtype to leverage lightgbm's strengths in data pre-processing. Use stratified train-test splits and lightgbm's weight parameter to address the 20% no-show imbalance.
We pre-process the appointments data, train a Lightgbm baseline model with default parameters, evaluate on the test set, and discuss feature importance along with hyperparameter tuning and cross-validation.
Fit a baseline lightgbm model with default parameters, prepare data by converting categoricals to category dtype, split train and test, and fit on training data using a 42 random state.
Evaluate the baseline Lightgbm model on the test set using accuracy, precision, recall, F1, and ROC AUC with scikit-learn metrics, then discuss imbalance and future tuning.
Address class imbalance in a LightGBM baseline by applying the class_weight parameter, improving minority class recall and F1, with trade-offs in precision and overall accuracy.
Tackle hyperparameter tuning for Lightgbm by examining key parameters—num leaves, max depth, learning rate, n estimators, min child samples, and L1/L2 regularization—and explore Bayesian optimization as the recommended approach.
Learn to implement Bayesian hyperparameter tuning with Optuna for a LightGBM model, optimize ROC AUC, and discuss code and cross-validation practices across train, validation, and test splits.
Compare the optimized model to the baseline, noting roc score improvements. Explore reasons like data complexity, class imbalance, and underfitting, with strategies such as refining search space and feature engineering.
Raise the threshold to 88% to boost precision to 75% in a hospital overbooking scenario. Highlight the recall near 0.6%, revealing the precision-recall trade-off and its business implications.
Learn how to extract and visualize feature importance from a trained Lightgbm model in Python, identifying key predictors like age, time gap, and no-show history.
Explore supervised learning and clustering of patient appointment data using chatgpt, build and evaluate a baseline model with scikit-learn, and determine the optimal cluster count via the elbow method.
Leverage ChatGPT as your personal assistant to cluster patient data using unsupervised learning in a Jupyter notebook: load patients.csv, build pipelines, fit baseline models, optimize clusters, and interpret results.
Load and inspect the patients data set from the csv of appointments, derived via aggregation, analyze no show and sms rates, and illustrate the data structure with ChatGPT.
Brainstorm with ChatGPT to outline a clustering workflow for patient data, compare k-means, hierarchical, and dbscan, determine cluster count with elbow method and silhouette score, and implement with scikit-learn.
Automate data pre-processing for k-means clustering with feature selection, one-hot encoding, scaling, imputation, and scikit-learn pipelines, before determining clusters.
Fit a two-step pipeline of data preprocessing and k-means clustering on the data, address a future warning by setting n_init explicitly, and obtain a fitted pipeline for analysis.
Analyze the clustering results from a k-means model by extracting labels and centroids, then evaluate silhouette score and cluster characteristics.
Apply the elbow method with k-means to optimize cluster count by plotting the within-cluster sum of squares for 1–10 and selecting a sweet spot near four clusters.
Fit a four-cluster k-means pipeline and compute cluster means. Interpret clusters as senior patients with chronic conditions, young adults with low engagement, middle-aged high SMS interaction, and youngest healthiest group.
explore building an XGBoost regression model on the movies dataset to predict box office revenues, starting with data inspection, wrangling, feature engineering, EDA, and tuning via pipelines and random search.
Apply XGBoost regression to a movies dataset, load and inspect data, perform explanatory data analysis, and split into training and test sets 75%/25% with random state 42 to predict revenue.
split data into xtrain, ytrain, xtest, ytest with revenue as target, build a column transformer to preprocess numerical, categorical, and boolean features, and tune an xgboost regressor with randomized search.
Welcome to the first Data Science and Machine Learning course with ChatGPT. Learn how to use ChatGPT to master complex Data Science and Machine Learning real-life projects in no time!
Why is this a game-changing course?
Real-world Data Science and Machine Learning projects require a solid background in advanced statistics and Data Analytics. And it would be best if you were a proficient Python Coder. Do you want to learn how to master complex Data Science projects without the need to study and master all the required basics (which takes dozens if not hundreds of hours)? Then this is the perfect course for you!
What you can do at the end of the course:
At the end of this course, you will know and understand all strategies and techniques to master complex Data Science and Machine Learning projects with the help of ChatGPT! And you don´t have to be a Data Science or Python Coding expert! Use ChatGPT as your assistant and let ChatGPT do the hard work for you! Use ChatGPT for
the theoretical part
Python coding
evaluating and interpreting coding and ML results
This course teaches prompting strategies and techniques and provides dozens of ChatGPT sample prompts to
load, initially inspect, and understand unknown datasets
clean and process raw datasets with Pandas
manipulate, aggregate, and visualize datasets with Pandas and matplotlib
perform an extensive Explanatory Data Analysis (EDA) for complex datasets
use advanced statistics, multiple regression analysis, and hypothesis testing to gain further insights
select the most suitable Machine Learning Model for your prediction tasks (Model Selection)
evaluate and interpret the performance of your Machine Learning models (Performance Evaluation)
optimize your models via handling Class Imbalance, Hyperparameter Tuning & more.
evaluate and interpret the results and findings of your predictions to solve real-world business problems
master regression, classification, and unsupervised learning/clustering projects
We´ll cover prompting strategies and tactics for GPT-3.5 / GPT-4o mini (free) and GPT-4 / GPT-4o (paid subscription). Know the differences and master both!
The course is organized into Do-it-yourself projects with detailed project assignments and supporting materials. At the end, you will find a video sample solution. All solutions and sample prompts are available for simple download or copy/paste!
Who is this Course for?
Data Science Beginners who have no time to learn everything from scratch
Skilled Data Scientists seeking to outsource the most time-consuming parts of their work to save time
Are you ready to be at the forefront of AI in Data Science? Enroll now and start transforming your professional landscape with AI and ChatGPT!