
Under this section and lecture we are going to talk about
Welcome to Data Science
Understanding of a workflow
First end-to-end problem: Teaching to the machine
First end-to-end workflow in Knime
Differences between the concepts of Analytics vs Analysis
Types of analytics: Descriptive, Predictive and Prescriptive Analytics
Sample problems for each type of analytics and data science approach for each of them.
Concepts of Train and Test sets, splitting the data set into two sets and splitting strategies : Random Sampling, Linear Sampling, Take from Top or Spatial Sampling
Problem types by the data type
k-NN algorithm and the eager / lazy learning strategies, hyper parameters and the concept of distance in machine learning
KNN implementation, concept of probabilities for classification.
concept of confidence intervals
Numeric Distances: Manhattan, Euclidean, Minkowski, Chebyshev Distances
String Distances: Levenshtein Distance
Programming Language define distances, matrix distances or date / time distances
SVM Classification and concepts of hard Margin / soft Margin or types of kernels like, linear, polynomial, exponential or radial basis function (RBF).
Why ARM has an increasing trend and what are the Recommender algorithm or complex event processing?
The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
Business Understanding : We cover the types of problems and business processes in real life
Data Understanding: We cover the data types and data problems. We also try to visualize data to discover.
Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting.
Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime.
Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and rmse, rmae, mse, mae for Regression / Prediction problems with Knime.
BONUS CLASSES
We also have bonus classes for artificial neural network and deep learning on image processing problems.
Warning: We are still building the course and it will take time to upload all the videos. Thanks for your understanding.