
That's a difficult question. That's like saying, "What is love?" Love can be many things to different people!
But put simply, Alteryx is a tool that will allow you to increase the efficiency of your data handling. Cleaning that we used to perform solely in Excel can be automated in Alteryx. Think of it as macro record but on a whole different scale.
Alteryx has built-in function that replicate the same cleaning we do for data and it's so easy to learn! In fact, in all honesty, at the time of this writing and filming I had only been using it for less than 3 weeks! That's how easy it is to learn!
In this video you'll see a very quick demonstration of how Alteryx can replace what we do in excel and it's so easy anybody can do it! Enjoy!
Every analysis in Alteryx is generally made up of 3 parts.
Input - The data that you bring in for analysis, manipulation or transfer
Analysis - the actions and functions you perform on the data to convert it from one thing to another. I.e. making the data useful rather than just raw data.
Output - to get it out of Alteryx and into another form such as a PDF form, a HYPER data set for Tableau, CSV outputs, etc.
Before we go further I think it's always good to teach people how to help themselves. Alteryx in my opinion has some of the best Help functions and documentation I've seen across any software. It's great because it will actually show you a working example you can look into and see what it's actually doing.
After this we'll get into filtering which is one of the most common things you'll be doing in order to separate the data to get what you need. The version I show here is the simplest method but filtering can be very comprehensive and can do much more than you see here!
Formulas are another common function you'll be using. Calculations in Alteryx work differently from Excel in that a single equation is automatically applied to all data in the set. In Excel you have to create the calculation followed by extended it downwards to your whole set.
Sampling is the next thing I'll show you which is a great way of limiting your data temporarily. In my case, we deal with large data sets and during a build you don't want to include everything as it hinders performance. So instead we take a small sample first during the prototype.
Sorting is pretty self explanatory. Basically, it sorts your information highest to lowest or alphabetically depending on the data set type.
If you have a large data of let's say, Sales data across regions. You'll most likely want to know what the total sales are for the region. To do this you will use the summarize tool which sum's up all the information and groups them in the format you need.
Summarizing or Aggregating as it's sometimes known does more than sum. You can find the average, minimum or maximum value, count the number of entries or go even further and do statistical calculations.
I love this feature. The idea is that you can find several values and replace them with something else. This is very useful in data cleansing where inconsistencies are present. For example, referring to the same thing using different names such as U.S.A and United States of America.
For machine learning in particular, we'll need to make these consistent for the algorithms to have a good chance of creating a model.
In a lot of cases, we collect information over time. For example, I'll download January information, then February, March and so on across the year so that I have a historical collection of results such as Sales over time.
Combining this information, or Unioning them over time can be frustrating and meticulous work. Alteryx makes this very simple by having its own Union function. Drag and drop and that's basically it!
When data cleansing we sometimes need to separate data from one column into multiple columns to make use of its contents better. For example a person address in a single field could be "55 Huntington Road, Mt Scala, 55689, U.S.A". However, if I was to count the number of entries under U.S.A I wouldn't be able to do it directly. I'd need to extract the U.S.A part first.
The same goes for analyzing information that's been separated but needs to be recombined or reconstituted.
This is an extension of the previous video that will let you recombine your information back to its original state or to do this for data that comes in this form which has happened to me from time to time. it is particularly useful as i don't know any other program that can do it quite this way.
I see the road to Machine Learning and Data Science as being a two part problem. Machine learning is about taking known inputs and outputs to a system and having the model develop its own algorithm. Sort of the opposite to traditional mathematics where you usually have the algorithm and the inputs.
But in order to do this you read training data to give to the model (More on this in later topics). In order to prepare this data you'll need to know how to clean, prepare and manipulate data. This will be our first mission.
The course is constructed in the following manner:
Beginner - Inputs/Outputs, Basic cleansing tools such as sorting, filtering, browsing, formulas, aggregates and more.
Intermediate - Handling dates and time, more filtering and formulas, crosstabs, transposition, joins, appends, sampling, spatial and reporting.
Expert - Fuzzy matching, auto field, row/column manipulation, aggregates, weighted average, grouping, spatial and reporting
Specialized - These are subjects that aren't necessary in data cleansing but if mastered provide significant advantages in cleaning more complex data! This includes the use of Macros, Apps, Connectors, In-Database and More!
We'll be starting with one of the most common types of data sets you'll be using; the CSV. Standing for Comma separated values, its a text based data set where each column of data is separated by you guess it...a comma.
You have a number of options in reading such data which I'll cover in this video.
An Access database is another common type of data set to connect to. For those who have somewhat outgrown Microsoft Excel's capabilities, Microsoft Access is the next best option as its freely available and has all the basic tools to store relatively large data sets, perform complex queries, create interfaces and forms and finally reports.
Unfortunately, you will not be able to join in this example as I cannot upload an Access Database as a training file. However, feel free to use one of your own.
The Tab delimited structure is similar to the CSV structure but instead of a Comma, a Tab is used as a separator. To read it in Alteryx requires a few extra steps but relatively straight forward.
I began filming XML and JSON uploads but realized that it requires some more advanced tools to complete the entire cleansing sequence which we have yet to cover. So instead, this video is to let JSON and XML students know that I haven't forgotten about you but that this topic will be covered later in the course once you've learned the cleaning tools we'll need.
In this video we'll upload multiple sheets at once. There's two methods to do this. The first is to do it manually and use a union to combine the data. This is my usual go-to method as it affords me lots of control. However, the limitation is when you have many sheets and performing the union manually would simply take too much time.
In this case you can use the Dynamic Input tool which will import multiple sheets at once!
The Wildcard method allows you to import multiple Excel files with the same structure all at the same time. It requires a few prerequisites but relatively straight forward. This method is great for data that must be collected over time.
This is a convenient method of importing data. In some cases, downloading data from the internet or certain software's will give you a zipped file containing many individual files. This is for convenience so that you don't need to download each file individually.
This is the easiest method by far as all you need is a copy and paste. For any set of table data, you can simply copy and paste the data straight into Alteryx. This will make use of Text Input function and store the data in the Alteryx Workbook itself.
The Select tool is one of the most common tools you'll be using in your flows. It allows you to choose which fields of data to include, what data types are allocated to each field, allows you to rename and provide descriptions, add suffix's and prefixes and more!
Sorting data is a very common occurrence when cleaning or preparing it. In Alteryx its a simple tool which lets you sort a single field or multiple fields at once.
Filtering lets you chop and divide your data based on rules you set such as only extracted data where the data is past a specific date you set. You can also apply filters to strings, numbers and more.
In this particular example we will only be using the basic filter and will cover the advanced filter in later sections.
This is a huge time saver as it will let you clean up data in a variety of ways with just a few clicks. Compared to Excel where you'd have to apply formulas as well as need some more advanced skill, the Alteryx Data Cleanser will let you clean blanks/empty cells, remove white space, line breaks, tabs, letters, punctuation and lastly modify the capitalization of a string.
This is yet another important tool that you'll be using a lot. In the Accelerator program we only covered its basic use. In this lesson we'll take a look at the different numeric, string and statistical options you have.
Most Text-to-Column functions do just that...text to columns. But in Alteryx you have the option of also doing Text-to-Rows. In this video we'll cover applications of both as well as using it in conjunction with Record ID, Sorting and finally the Summarize Tool with the Concatenation feature.
The Find and Replace tool has many features you can use. in this video we cover how to choose which portion of your text to replace, case insensitivity, Whole Word matching and lastly appending. I personally use this tool a lot to help me clean large sets of data without using Excel-style methods of cleaning.
The formula tool has many features but too much to cover in this one video. hence we'll only be covering the methods which are the most useful for beginners which are basic arithmetic operations, text-based cleansing, multiple calculations, calculation order/sequence and finally concatenation.
Firstly, don't forget to download the PDF guide which will help you understand joins as you get started. For beginners it can be a bit confusing but remember to always start with something basic and expand your understanding from other.
Put simply, a join will try and combine two data sets by using a common field. If they match then they will join together, if they don't match, then they won't join. But you can decide on which data sets can be included or excluded.
Continuing on from single joins are multi-joins. essentially the same thing except that you are joining on more than one common field. Very useful for examples where you want to join two data sets such as by year and region. If you only join one, you most likely will create unintentional duplicates.
As a starting point be sure to download the new working data sets. In this example we'll go through a few practice sets so you can get a feel for how joins perform. Later on in the course, we'll be using these in more complex ways so you can see how I handle them.
One added benefit of understanding joins is that joins are present in a majority of modern software packages. hence this knowledge will extend beyond Alteryx. Yay!
When joining data sets you need a common field. However, what if you don't have a unique row? Well in that case you can use the Record ID tool which will give you a unique ID or sometimes known as a primary key in other programs. It's a very neat and clean way to preserve and control your data sets especially when your data sets get massive.
I love stocks! I love analyzing the stock marketing, running simulations of investments and trying to figure out what the market will do next! Why do I want to do this? Well, like many other Australians, Americans and people around the world, I want to take my future into my own hands. I want my money to work for me so that it can take care of my family.
Having the ability to analyze data as far back as 100 years or more gives you a massive advantage to not just listen to people on the internet and their more often than not illogical or "no-logic" conclusions but to analyze and let the numbers speak for themselves.
In this video we'll start with an already existing portfolio of stocks and see how it performs against the U.S. and Australian economies. This is to understand how this particular portfolio will behave during the Global Financial Crisis, COVID-19 or any other tragedies in the future.
I've been using visualization tools such as Tableau for a very long time and before that Excel. When data is clean its so much fun to do visualizations and reporting. However, in reality, data is more often than not very dirty, messy, incomplete or needs to be combined with other data sets.
Therefore, you need a tool that can help you combine, consolidate and manipulate data in just the way you need it. In addition this, the other problem with standard business processes is the repetitive manual work we need to perform every single day. The routine is always the same. Data comes in, we filter here and there, do some calculations and send it out. Wouldn't it be nice to automate this without needing to know programming languages such as VBA, SQL or Python?
Well the answer is Alteryx! Alteryx has built in, drag and drop tools designed specifically to make the data cleansing process easier. What would usually take me hours to do in Excel I can do in just minutes in some cases. The beauty is that once I've set it up the first time all I need to do is press RUN! This saves me a lot of time!
***
The other benefit of Alteryx is the ability to perform MACHINE LEARNING analysis. Machine Learning or Artificial Intelligence in some circles, is the ability to predict and classify data. For example, when watching Netflix, they collect your viewing habits to find recommendations for movies you may like. Or on Amazon and eBay, your buying habits are analyzed to predict the most likely items you'd want to but in the future. But it doesn't stop here! Machine Learning is a MASSIVE and EXCITING space to get into!
if you're like, who doesn't have a Computer Science, Programming or Data Science background then this course is for you. No fancy languages or terminology. In this course I'll teach you the basics of Alteryx and ultimately get you started on Machine Learning!