
Eric champions no-code data exploration and cleaning using tools like Altair six and nine, guiding clients from problem framing to turning data insights into solutions.
Use exploratory data analysis to gain data understanding, check data quantity and labeling, and identify quality issues that can derail modeling.
Explore the five core data types—double (and float), integer, string, date/time, and boolean—in KNIME and any data analysis tool, with notes on math, comparisons, and joins.
Explore summary statistics from the statistics node, including mean, standard deviation, variance, skew, and a histogram to reveal data distribution across three shifts and guide data cleaning.
Learn data cleaning through exploration, iterative fixes of strings and duplicates, and handling missing data and outliers to prepare data for modeling.
Convert a date-like string to a date type using the string to date step in time series. Use the format d-MM-yy, noting capital M denotes month and the dash matters.
Learn practical string manipulation to clean data: convert numbers to strings, remove periods and commas, strip exponential notation, and pad identifiers to a fixed length.
Learn to clean up your data workflow by combining nodes into metanodes, label steps, and fit large programs into a single node you can open to inspect.
Apply row filtering to remove unwanted data by color-based and rule-based criteria, using pattern matching and membership checks to exclude red and black entries and keep orange and blue.
Explore what machine learning is, including supervised, unsupervised, and reinforcement learning, and distinguish classification versus regression with examples like predicting categories and house prices.
Learn to implement a linear regression model, diagnose issues using a correlation matrix, and select predictors by removing highly correlated features to predict cash, with 80/20 partition and R-squared.
Normalize data with a z-score before applying k-means to ensure Euclidean distance reflects the income axis, highlighting why scaling incomes and credit scores matters for clustering.
Explore data-driven segmentation with decision trees and pattern aggregation to reveal segments like divorced customers with kids and income-based bins for targeted boat rental marketing.
Do you want to super charge your career by learning the most in demand skills? Are you interested in data science but intimidated from learning by the need to learn a programming language?
I can teach you how to solve real data science business problems that clients have paid hundreds of thousands of dollars to solve. I'm not going to turn you into a data scientist; no 2 hour, or even 40 hour online course is able to do that. But this course can teach you skills that you can use to add value and solve business problems from day 1.
This course is different than most for several reasons:
1. We start with problem solving instead of coding. I feel like starting to code before solving problems is misguided; many students are turned off by hours of work to try to write a couple of meaningless lines rather than solving real problems. The key value add data scientists make is solving problems, not writing something in a language a computer understands.
2. The examples are based on real client work. This is not like other classes that use Kaggle data sets for who survived the Titanic, or guessing what type of flower it is based on petal measurements. Those are interesting, but not useful for people wanting to sell more products, or optimize the performance of their teams. These examples are based on real client problems that companies spent big money to hire consultants (me) to solve.
3. Visual workflows. KNIME uses a visual workflow similar to what you'll see in Alteryx or Azure Machine Learning Studio and I genuinely think it is the future of data science. It is a better way of visualizing the problem as your are exploring data, cleaning data, and ultimately modeling. It is also something that makes your process far easier to explain to non-data scientists making it easier to work with other parts of your business.
Summary: This course covers the full gamut of the machine learning workflow, from data and business understanding, through exploration, cleaning, modeling, and ultimately evaluation of the model. We then discuss the practical aspects of what you can change, and how you can change it, to drive impact in the business.