Almost all companies these days are investing thousands of dollars in data analysis to get their data analyzed. Well, in fact studies say that there are around 73% of organizations have invested in Big Data. Why do you think that is the case? What can you reap of the data, ideally just 1s and 0s? Moreover, how does this data help an organization’s future?
Most of you might have guessed it right; the market trends, the consumer habits can all be precisely predicted, if we are able to analyze our data efficiently. This Learning Path will tell you how you can achieve all this by using Julia.
Packt’s Video Learning Paths are an amalgamation of multiple video courses that are logically tied together to provide you with a larger learning curve.
With the amount of data that is generated in the world these days, we are faced with the challenge of analyzing this data. Julia, which enjoys the benefits of a sophisticated compiler, parallel execution, and an all-encompassing mathematical function library, acts as a very good tool that helps us work with data more efficiently.
In this Learning Path, embark on your journey from the basics of Julia, right from installing it on your system and setting up the environment. You will then be introduced to the basic machine learning techniques, data science models, and concepts of parallel computing.
After completing this Learning Path, you will have acquired all the skills that will help you work with data effectively.
About the Authors
Ivo Balbaert is currently a web programming and databases lecturer at CVO Antwerpen, a community college in Belgium. He received a PhD in applied physics in 1986 from the University of Antwerp. He worked for 20 years in the software industry as a developer and consultant in several companies, and, for 10 years, as a project manager at the University Hospital of Antwerp. In 2000, he switched over to partly teach and partly develop software (KHM Mechelen, CVO Antwerp).
Jalem Raj Rohit is an IIT Jodhpur graduate with a keen interest in machine learning, data science, data analysis, computational statistics, and natural language processing (NLP). Rohit currently works as a senior data scientist at Zomato, also having worked as the first data scientist at Kayako.He is part of the Julia project, where he develops data science models and contributes to the codebase. Additionally, Raj is also a Mozilla contributor and volunteer, and has interned at Scimergent Analytics.
We are going to install Julia with any one of the common development environments available.
Program data needs to be stored efficiently and in an easy to use form.
This video deals with the problem of how to control the order of execution in Julia code and what to do when errors occur.
Julia code is much less performant and readable when the code is not subdivided in functions.
Arrays can only be accessed by index and all the elements have to be of the same type. We want more flexible data structures; in particular, we want to also store and retrieve data by keys.
Data is often presented in the form of a matrix. We need to know how to work with matrices in order to work on data.
The aim of the video is to show you the importance of using types and parametrized methods in writing generic and performant code.
Coding is often a repetitive task. Shorten your code, make it more elegant and avoid repetition by making and using macros.
In order to build a Julia package we need something to structure that, why? Because of the following reasons: A package can contain multiple files and Different packages can have functions with the same name that would conflict
Functionality that you need in your project is often already written and exists as a package. How to search, install, and work with these packages?
In order to process data, we need to get them out of their data-sources and into our Julia program.
Working with tabular data in matrices is possible, but not very convenient. The DataFrame offers us a more convenient data structure for data science purposes.
What are the possibilities that DataFrame offers for data manipulation?
Relational databases are an important data source. How can we work from Julia with the data in these data sources?
In certain situations data is better stored in NoSQL databases. Julia can work with a number of these through specialized packages; amongst them are Mongo and Redis.
We need to calculate various statistical numbers to get insight into a dataset. How can we do this with Julia?
Data must be graphically visualized to get better insight onto them. What are the possibilities Julia offers in this area?
Scatterplots, histograms, and box plots are some of the basic tools of the data scientist. We investigate our iris data by using each of them in turn.
In statistical investigations, we need to be able to define distributions, cluster data into groups, and test hypotheses.
A lot of useful libraries exist written in R that are not yet implemented in Julia. Can we use these R libraries from Julia code?
Data must be prepared before machine learning algorithms can be applied. Furthermore, applying an algorithm follows a specific cycle, which we will review here. The MLBase package will be used in this section.
Data often needs to be classified in groups; Decision Tree is one of the basic algorithms to do that.
In a realistic setting, a model is first trained, and then tested.
To obtain better linear regression models, and to be able to work with more independent variables, we need more generalized linear modeling.
We need a better classification algorithm than Decision Trees for more complex data, like in pattern recognition. The Support Vector Machine is developed for these tasks.
In this video, we will explain ways in which you can handle files with the comma-separated values (CSV) file format.
TSV files are files whose contents are separated by commas(,)In this video, we will explain how to handle TSV files.
This video will teach you how to interact with websites by sending and receiving data through HTTP requests.
We will see Interpretation and representation of Julia programs.
Improve the tasks that need to process the string both time-and space-efficient.
Create an expression with a single argument.
Construction of Expression objects when having multiple objects and/or variables is difficult.
Evaluating an expression object.
Compilation of code directly rather than the conventional method of constructing expression statements and using the eval function.
Metaprogramming techniques help speed up the process of dealing with data frames.
You will learn about doing statistics in Julia, along with common problems in handling data arrays, distributions, estimation, and sampling techniques.
Descriptive statistics helps us estimate the shape and features of data for model and algorithm selection.
Deviation metrics helps calculate the distance between two vectors. These metrics help us understand the relationship between the different vectors and the data in them.
Sampling is the process where sample units are selected from a large population for analysis.
Correlation analysis is the process that indicates the similarity and relationship between two random variables.
In this video, you will learn about the concept of dimensionality reduction.
This video will let you explore about Linear regression model which can be used for explaining the relationship between a single dependent variable and independent variable.
Linear regression is a linear model that is used to determine and predict numerical values. We will deal with that in this video.
What could we do in those scenarios where the variable of interest is categorical in nature, such as buying a product or not, approving a credit card or not, tumor is cancerous or not, and so on? Logistic regression is the best solution to these.
Analysis of performance is very important for any analytics and machine learning processes. In this video, we will deal with performance evaluation and model selection.
In this video, we will deal with cross validation is one of the most underrated processes in the domain of data science andanalytics.
In statistics, the distance between vectors or data sets are computed in various ways depending on the problem statement and the properties of the data. In this video, we will deal with distances.
In this video, we will deal with difference type of distribution.
Time series is another very important form of data. This video deals with time series analysis.
Plotting of arrays is important in visualization as arrays are quick to store data.
DataFrames are the best way for representing tabular data.
Use several functions for both transforming and exploratory analytics steps and to plot separate functions as well as to stack several functions in a single plot.
It is thorough the exploration of the data that we find any possible patterns that can be identified through basic statistics and the shape of the data using plots and visualizations.
Line plotscan be used both to understand correlations and look at data trends.
Scatter plots help in data distribution and see the relationship between the corresponding columns, which in turn helps identify some prominent patterns in the data.
Histograms are one of the best ways for visualizing and finding out the three main statistics of a dataset—the mean, median, and mode.
Customization of plot enhances the visualization of the plot even further.
Optimize data movements as it is quite common and should be minimized due to the time and the network overhead.
Learn about the famous Map-Reduce framework and why it is one of the most important ideas in the domains of big data and parallel computing and how to parallelize loops and use reducing functions on them through several CPUs and machines.
Channels are like background plumbing for parallel computing in Julia, They are the reservoirs from which the individual processes access their data.
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.