
Explore measures of central tendency, including mean, median, and mode, and learn how to describe data distribution, assess skewness, and choose appropriate measures for different data types.
Data visualization uses plots like histogram, box plot, and scatter plot to reveal patterns, outliers, and correlations. Visualizing data before cleaning highlights median, quartiles, and linearity via r-squared.
Demonstrates how a confusion matrix compares actual and predicted labels, defines accuracy as (tp+tn)/sum, and derives the kappa statistic (p0−pe)/(1−pe) using agreement and disagreement.
Predict payment fraud with machine learning on a payments fraud dataset from GitHub, using features like account age, transactions, local time, and payment methods, assess model accuracy amid data imbalance.
Model a logistic regression classifier, encode categorical features with get_dummies, split data into train and test sets, and evaluate accuracy, with an option to switch to a decision tree.
Apply logistic regression to payment fraud data, convert nominal data with one-hot encoding, and evaluate with train-test split and accuracy; swap to a decision tree for comparison.
Select 15 key features and train a random forest malware classifier, achieving 99.55% accuracy with cross-validated evaluation, and note it outperforms gradient boosting.
Develop a real-time phishing detector using logistic regression, leveraging a dataset from the ECI repository with 30 features and a label, then start coding to build the model.
Compare logistic regression and decision tree models for phishing detection, train on a labeled dataset, and report accuracy, with the decision tree achieving about 97.95 percent on testing.
Explore how pop3, imap, and smtp email servers manage spam and ham, from remote storage to user fetch, organization, and deletion, with data and feature selection guiding classification.
Download the ham and spam dataset, process with pandas, vectorize text using count vectorizer, train a naive bayes classifier, and achieve about 99% accuracy with strong recall and precision.
learn to build a twitter bot detector by framing it as a binary classification problem, download and inspect the 20-column, 2797-row dataset with nominal 0/1 values, and begin coding.
Apply machine learning to a cyber attacks dataset by preparing features, dropping categorical data, and evaluating decision tree and random forest models with cross validation for high accuracy.
Machine learning is disrupting cybersecurity to a greater extent than almost any other industry. Many problems in cyber security are well suited to the application of machine learning as they often involve some form of anomaly detection on very large volumes of data. This course deals the most found issues in cybersecurity such as malware, anomalies detection, SQL injection, credit card fraud, bots, spams and phishing. All these problems are covered in case studies.
Section 1:Statistics - Machine Learning
Lecture 1:Central Tendency (Preview)
Lecture 2:Measures Dispersion (Preview)
Lecture 3:Data Visualization (Preview)
Lecture 4:Confusion Matrix, Accuracy and Kappa
Section 2:Case Studies
Lecture 5:Introduction to Payment Fraud (Preview)
Lecture 6:Machine Learning in Payment Fraud
Lecture 7:"NO CODING"_Machine Learning in Payment Fraud
Lecture 8:Introduction to Malware
Lecture 9:Machine Learning in Malware
Lecture 10:Introduction to Phishing
Lecture 11:Machine Learning in Phishing
Lecture 12:Introduction to IDS
Lecture 13:Machine Learning in IDS
Lecture 14:Introduction to Spam
Lecture 15:Machine Learning in Spam
Lecture 16:Introduction to Twitter Bot Detector
Lecture 17:Machine Learning in Twitter Bot Detector
Lecture 18:Introduction to Malicious SQL Injection
Lecture 19:Machine Learning in SQL Injection
Lecture 20:"NO CODE"_Machine Learning in Medical Fraud Detection (Preview)
Data.zip