Julia: Performing Statistical Computations
2.1 (3 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
24 students enrolled

Julia: Performing Statistical Computations

Mould your programming skills by carrying out dynamic numerical computations with Julia
2.1 (3 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
24 students enrolled
Created by Packt Publishing
Last updated 5/2017
English [Auto-generated]
Current price: $11.99 Original price: $199.99 Discount: 94% off
1 day left at this price!
30-Day Money-Back Guarantee
This course includes
  • 1.5 hours on-demand video
  • 14 articles
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to Udemy's top 3,000+ courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Get familiar with the key concepts in Julia
  • Follow a comprehensive approach to learn Julia programming

  • Get an extensive coverage of Julia’s packages for statistical analysis

  • Sharpen your skills to work more effectively with your data
  • The software requirements assume you have any of the following OSes: Linux, Windows, or OS X
  • There are no specific hardware requirements, except that you run and work all your code on a desktop, or a laptop preferably

Julia is a high-performance dynamic programming language for numerical computing. This practical guide to programming with Julia will help you to work with data more efficiently.

This course begins with the important features of Julia to help you quickly refresh your knowledge of functions, modules, and arrays. We’ll explore utilizing the Julia language to identify, retrieve, and transform datasets so you can perform efficient data analysis and data manipulation.

You will then learn the concepts of metaprogramming and statistics in Julia.

Moving on, you will learn to build data science models by using several algorithms such as dimensionality reduction, linear discriminant analysis, and so on.

You’ll learn to optimize data science programs with parallel computing and memory allocation. You’ll get familiar with the concepts of package development and networking to solve numerical problems using the Julia platform.

This course includes sections on identifying and classifying data science problems, data modelling, data analysis, data manipulation, multidimensional arrays, and parallel computing.

By the end of this course, you will acquire the skills to work more effectively with your data.

What am I going to get from this course?

  • Extract and manage your data efficiently with Julia

  • Explore the metaprogramming concepts in Julia

  • Perform statistical analysis with StatsBase.jl and Distributions.jl

  • Build your data science models

  • Find out how to visualize your data with Gadfly

  • Explore big data concepts in Julia

What’s special about this course?

We've spent the last decade working to help developers stay relevant. The structure of this course is a result of deep and intensive research into what real-world developers need to know in order to be job-ready. We don't spend too long on theory, and focus on practical results so that you can see for yourself how things work in action.

We have combined the best of the following Packt products:

  • Julia Cookbook by Jalem Raj Rohit
  • Julia Solutions by Jalem Raj Rohit

Meet your expert instructors:

Jalem Raj Rohit is an IIT Jodhpur graduate with a keen interest in machine learning, data science, data analysis, computational statistics, and natural language processing (NLP). Rohit currently works as a senior data scientist at Zomato, also having worked as the first data scientist at Kayako. He is part of the Julia project, where he develops data science models and contributes to the codebase. 

Meet your managing editor:

This course has been planned and designed for you by me, Shiny Poojary. I'm here to help you be successful every step of the way, and get maximum value out of your course purchase. If you have any questions along the way, you can reach out to me and our author group via the instructor contact feature on Udemy.

Who this course is for:
  • This course is for Julia programmers who want to learn data science right from exploratory analytics to the visualization part.
  • Anyone who wants to work more effectively with data
Course content
Expand all 37 lectures 02:27:08
+ Getting Started
5 lectures 24:18

In this section, we will explain ways in which you can handle files with the Comma-separated Values (CSV) file format.

Preview 06:00

In this section, we will explain how to handle Tab Separated Values (TSV) files.

The DataFrames package is needed to deal with TSV files. So, as it is already installed as instructed in the previous section, we can move ahead and make sure that all the packages are up-to-date. The following video will show how to proceed:

Handling data with TSV files

In this section, we will explain ways to handle data stored in databases: MySQL and PostgreSQL.

Working with databases in Julia

In this section, you will learn how to interact with the Web through HTTP requests, both for getting data and posting data to the Web. You will learn about sending and getting requests to and from websites and also analyzing those responses

Interacting with the Web
Quiz Time!
2 questions
+ Metaprogramming
7 lectures 29:23

In this section, you will study the life of a Julia program and how it is actually represented and interpreted by Julia. You will also learn what is meant by "a language expressing its own code as a data structure of itself."

Preview 06:14

In this section, you will learn about symbols and expressions in detail. They have a syntactic importance in the metaprogramming concepts of Julia. So, this section would explain them in detail, so as to appreciate the concepts covered so far and those to follow.

Symbols and expressions

The usage of a semicolon to represent expressions is known as quoting. The characters inside the parentheses after the semicolon constitute an Expression object. In the following video, we will create an expression with a single argument:


Sometimes, construction on Expression objects is difficult, especially when you have multiple objects and/or variables. This is used for easy and readable expression construction.


The eval() function is simply used for executing or evaluating an Expressionobject. The evaluations and executions are done in a global scope.

The Eval function

In this video, you will be introduced to macros, which are used to insert generated code into the programs. So, a macro is simply a block of code that can be compiled directly rather than the conventional method of constructing expression statements and using the eval() function. The advantage of using macros is that a block of code that has to be hardcoded multiple times can be generated on-the-fly by creating macros for it.


In this section, you will learn about implementing the concept of metaprogramming to dataframes.

Metaprogramming with DataFrames
Quiz Time!
2 questions
+ Statistics with Julia
4 lectures 21:51
Basic statistics concepts

Metrics that help calculate the distance or similarity between two vectors are called deviation metrics. These metrics help us understand the relationship between the different vectors and the data in them.

Deviation metrics
Correlation analysis
Quiz Time!
2 questions
+ Building Data Science Models
9 lectures 32:43
Dimensionality reduction

Linear discriminant analysis is the algorithm that is used for classification tasks. This is often used to find the linear combination of the input features in the data, which can separate the observations into classes. In this case, it would be two classes; however, multi-class classification can also be done through the discriminant analysis algorithm, which is also called the multi-class linear discriminant analysis algorithm.

Linear discriminant analysis

Data preprocessing is one of the most important parts of an analytics or a data science pipeline. It involves methods and techniques to sanitize the data being used, quick hacks for making the dataset easy to handle, and the elimination of unnecessary data to make it lightweight and efficient when used in the analytics process. For this recipe, we will use the MLBase package of Julia, which is known as the Swiss Army Knife of writing machine learning code. Installation and setup instructions for the library will be explained in the Getting ready section.

Data preprocessing

Linear Regression is a linear model that is used to determine and predict numerical values. Linear regression is one of the most basic and important starting points in understanding linear models and predictive analytics. For this video, we will use Julia's GLM.jl package.

Linear regression

Classification is one of the core concepts of data science and attempts to classify data into different classes or groups. A simple example of classification can be trying to classify a particular population of people as male and female, depending on the data provided. In this recipe, we will learn to perform score-based classification, where each class is assigned a score, and the class with the lowest or the highest score is selected depending on the problem and the analyst's choice.


Analysis of performance is very important for any analytics and machine learning processes. It also helps in model selection. There are several evaluation metrics that can be leveraged on ML models. The technique depends on the type of data problem being handled, the algorithms used in the process, and also the way the analyst wants to gauge the success of the predictions or the results of the analytics process. 

Performance evaluation and model selection

Cross validation is one of the most underrated processes in the domain of data science and analytics. However, it is very popular among the practitioners of competitive data science. It is a model evaluation method. It can give the analyst an idea about how well the model would perform on new predictions that the model has not yet seen. It is also extensively used to gauge and avoid the problem of overfitting, which occurs due to an excessive precise fit on the training set leading to inaccurate or high-error predictions on the testing set.

Cross validation

A probability distribution is when each point or subset in a randomized experiment is allotted a certain probability. So, every random experiment (and, in fact, the data of every data science experiment) follows a certain probability distribution. And the type of distribution being followed by the data is very important for initiating the analytics process, as well as for selecting the machine learning algorithms that are to be implemented. It should also be noted that, in a multivariate data set, each variable might follow a separate distribution. So, it is not necessary that all variables in a dataset follow similar distributions. In this video, we will work with a normal distribution and use the pdfcdf, and percentile functions:


Time series is another very important form of data. It is more widely used in stock markets, market analysis, and signal processing. The data has a time dimension, which makes it look like a signal. So, in most cases, signal analysis techniques and formulae are applicable for time series data, such as autocorrelation, crosscorrelation, and so on, which we have already dealt with in the previous chapters. In this recipe, we will deal with methods to get around and work with datasets with the time series format.

Time series analysis
Quiz Time!
2 questions
+ Working with Visualizations
8 lectures 27:19

Arrays are one of the fundamental data structures used in data analysis to store various types of data. They are also a quick way to store columns or dimensions in data, for statistical analysis as well as exploratory analysis through plots and visualization. Arrays are also very easy to plot, as they are simple. When a visualization is being done with two columns of a dataset, it means that the two column values are taken in the form of separate arrays and then plotted against each other, which again makes arrays very important.

Plotting basic arrays
Plotting dataframes

In data science and statistical modeling, there are several instances where an analyst needs to use several functions for both transforming and exploratory analytics steps. So, one can plot them in Gadfly in a very simple way, which can be used to plot separate functions as well as to stack several functions in a single plot. 

Plotting functions

Exploratory data analytics is one of the most important processes in a data science workflow. It is simply a thorough exploration of the data to find any possible patterns that can be identified through basic statistics and the shape of the data. It is mostly done with the help of plots, as visual information is much easier to comprehend than complex statistical terms. 

Exploratory data analytics through plots

Line plots, as we have already seen in the preceding examples, are very effective when it comes to exploratory data analytics. They can be used both to understand correlations and look at data trends. So, by further making use of aesthetics, we can make them more interesting and informative.

Line plots

Scatter plots are the most basic plots in exploratory analytics. They help the analyst get a rough idea of the data distribution and the relationship between the corresponding columns, which in turn helps identify some prominent patterns in the data.

Scatter plots

Histograms are one of the best ways for visualizing and finding out the three main statistics of a dataset: the mean, median, and mode. Histograms also help analysts get a very clear understanding of the distribution of data. The ability to plot categorical data as well as numerical data is what makes the histogram unique.


As we have already gone through how to plot the most important visualizations and their customizations in the Gadfly library, we will also see how to customize them even further.

Aesthetic customizations
Quiz Time!
2 questions
+ Parallel Computing
4 lectures 13:07

Parallel computing is a way of dealing with data in a parallel way. This can be done by connecting multiple computers as a cluster and using their CPUs to carry out the computations.

Basic concepts of parallel computing

In parallel computing, data movements are quite common and should be minimized due to the time and the network overhead as a result of the movements.

Data movement

In this video, you will learn a bit about the famous Map-Reduce framework and why it is one of the most important ideas in the domains of big data and parallel computing. 

Parallel maps and loop operations

Channels are like background plumbing for parallel computing in Julia. They are the reservoirs from which the individual processes access their data.

Quiz Time!
2 questions