
Welcome :-)
Learn basic information about software we will use - Python, Knime and Excel
Let's install Knime analytics platform
You will understand the Knime analytics platform environment
In this whole course we will work in favorite Jupyter notebook integrated in the Anaconda distribution.
Please download all 5 files in Excel and csv formats
We will load our data from several files of MS Excel and CSV.
We will load our data from several files of MS Excel and CSV.
Before we jump into the next section, I would like to describe you how to work with Knime nodes
We will load our data from several files of MS Excel and CSV.
We have uploaded all files. Now is time to join them into one data frame.
We will learn information about the data frame and we will transpose it
We can filter our data frame by selected parameters.
Learn how to group the data and sort them
We have uploaded all files. Now is time to join them into one data frame.
We will learn information about the data frame and we will transpose it
We can filter our data frame by selected parameters.
Learn how to group the data and sort them
We have uploaded all files. Now is time to join them into one data frame.
We will learn information about the data frame and we will transpose it
We can filter our data frame by selected parameters.
Learn how to group the data and sort them
We will create numeric binners of our numeric data into groups according the boundaries we will set up
After this lecture you will be able to convert the data types, rename the columns, add constant value and handle with problem when you need to go steps back and then execute whole procedure again.
In this lecture we will count basic calculations by using the math expressions
Let's filter out columns we do not need anymore and split them
How to handle with obvious problem - missing values?
Quite commonly valuable information you can get from date and time stamps
We will create numeric binners of our numeric data into groups according the boundaries we will set up
After this lecture you will be able to convert the data types, rename the columns, add constant value and handle with problem when you need to go steps back and then execute whole procedure again.
In this lecture we will count basic calculations by using the expressions in the math formula node
We will filter our data set by using column filter node and missing value column node, so we will learn how to filter out certain columns.
We can split our columns by using several splitting nodes into more columns
How to handle when having missing values? Use the missing values node and use more options in there
After this lecture you will be able to change the data types to date format and count the difference between two days
During this lecture you will see how easy is to extract different information from the date and time format, e.g. year, month, day etc.
We will create numeric binners of our numeric data into groups according to the boundaries we will set up
After this lecture you will be able to convert the data types, rename the columns, add constant value and handle with problem when you need to go steps back and then execute whole procedure again.
In this lecture we will count basic calculations by using the math expressions
Let's filter out columns we do not need anymore
How to handle with obvious problem - missing values?
Quite commonly valuable information you can get from date and time stamps
Column chart to compare values of different classes
Let's check data over time
After this lecture you will be able to create pie and donut charts and so see the split of your values.
The scatter plot plays very important role in the data analysing because it shows you the relationship between two items, thus if two items are correlated or not, and if the correlation is positive or negative.
Not so common but so much useful box plot depicts the distribution of your numerical values into the statistical groups - mean, maximum, minimum, first and third quartile.
After this video you will be able to create histogram and we will learn how to work with this node to get more information, options and details.
In this video we will crate line plot to see the trend of some of our numeric values, how they develop in time. Also, I will show you line plot java script, which offers a lot of visual effects and creates so nice pictures.
After this lecture you will be able to create pie and donut charts and so see the split of your values.
The scatter plot plays very important role in the data analysing because it shows you the relationship between two items, thus if two items are correlated or not, and if the correlation is positive or negative.
Not so common but so much useful box plot depicts the distribution of your numerical values into the statistical groups - mean, maximum, minimum, first and third quartile.
Column chart to compare values of different classes
Let's check data over time
After this lecture you will be able to create pie and donut charts and so see the split of your values.
The scatter plot plays very important role in the data analysing because it shows you the relationship between two items, thus if two items are correlated or not, and if the correlation is positive or negative.
Not so common but so much useful box plot depicts the distribution of your numerical values into the statistical groups - mean, maximum, minimum, first and third quartile.
You will learn how to get your data under one common range. Very important discipline in terms of data pre-processing to avoid low performance of your machine learning model
Documents also available (including the Knime and Jupyter Notebook's files) at:
https://1drv.ms/u/s!AolTGH3TVJWGknB78P751vjXzQ_I?e=EHc868
We will focus on the most time-consuming part of the machine learning process which is the data exploration consisting from data visualisation and data wrangling serving for preparing and understanding your data.
The whole course is full of different data manipulation and visualisation hands-on exercises in three popular data science platforms:
1. open-source and very progressive programming language Python
2. open-source, highly intuitive and effective analytics platform KNIME
3. the most popular for people working with data MS Excel,
where we we will load data, transform them and visualise them.
So, what will we cover during this course?
Start with the KNIME analytics platform (installation and environment description)
Start with Python (installation and environment description)
Gathering the data into all platforms (data from Excel and csv)
Data manipulation (preparation and transformation) I - Table, Row transform, Row filter and split
Data manipulation (preparation and transformation) II - Column Binning, Column Convert and replace, Column Filter, Column Split, Column Transform,
Data manipulation (preparation and transformation) III - Other data types – date and time.
Data manipulation - Feature scaling
Data visualisation - Histogram, Line plot, Pie chart, Scatter plot, Box plot