# Practical Introduction to Information Theory

## What you'll learn

- Learn how to formulate problems as probability problems.
- Solve probability problems using information theory.
- Understand how information theory is the basis of a machine learning approach.
- Learn the mathematical basis of information theory.
- Identify the differences between the maximum likelihood theory approach and entropy approach
- Understand the basics of the use of entropy for thermodynamics.
- Calculate the molecular energy distributions using Excel.
- Learn to apply Information Theory to various applications such as mineral processing, elections, games and puzzles.
- Learn how Excel can be applied to Information Theory problems. This includes using Goal Seek and Excel Solver.
- Understand how a Logit Transform can be applied to a probability distribution.
- Apply Logit Transform to probability problems to enable Excel Solver to be successfully applied.
- Solve mineral processing mass balancing problems using information theory, and compare with conventional least squares approaches.

## Requirements

- Required skills: knowledge of calculus, some understanding of probability theory.
- Prerequisites: You will use advanced Excel skills such as cell styles, Named Ranges, Arrays, Excel Solver and Goal Seek. Recommended prerequisites for these skills are the Udemy courses: Effective Use of Named Ranges, Mastering Named Ranges, Arrays and VBA in Excel, Mass Balancing using Excel Solver.
- The course does include some VBA skills but they are not essential as a prerequisite. Those learners with stronger VBA skills may choose to do some of the exercises in greater depth.
- The course will briefly explain mathematical methods such as Lagrange Multipliers, but will not explain it in depth, only at a depth sufficient for understanding the course problems. For learners with a strong understanding of Lagrange Multipliers may choose to do some of the exercises at greater depth.
- The course is based on Excel version in Microsoft Office 365. The course has not been applied using other versions of Excel.

## Description

*Section 1 Introduction*

Information theory is also called ‘the method of maximum entropy’. Here the title ‘information theory’ is preferred as it is more consistent with the course’s objective – that is to provide plausible deductions based on available information.

Information theory is a relatively new branch of applied and probabilistic mathematics – and is relevant to any problem that can be described as a probability problem.

Hence there are two primary skills taught in this course.

1. How to formulate problems as probability problems.

2. How to then solve those problems using information theory.

Section 1 largely focuses on the probabilistic basis of information theory.

**Lecture 1 Scope of Course**

In this lecture we discuss the scope of the course. Whilst section 1 provides the probabilistic basis of information theory the following sections are:

2. The thermodynamic perspective

3. Mineral Processing Applications

4. Other examples

5. Games and puzzles

6. Close

Information theory is a foundational approach to a particular approach of machine learning. The scope of this course does not go deeply into machine learning but does provide some insight into how information theory is the basis of machine learning.

**Lecture 2 Combinatorics as a basis for Probability Theory**

This course does not aim to be a probability course – and only provides a probability perspective relevant to information theory.

In this lecture you will learn basic probability notation and formula.

By applying the exercises you will develop understanding of the fundamental functions such as factorial and combinatorial.

**Lecture 3 Probability Theory**

In this lecture you will learn basic probability theory with primary focus being the binomial distribution. Here probability theory is based on first starting with single-event probabilities to estimate the probabilities of multiple-events. Here the single-event probabilities are equitable; i.e. the probability of choosing a particular coloured ball is 50%.

By applying the exercises, you will solve simple multiple-event probability problems building on the functions learnt in the previous lecture.

**Lecture 4 Non-equitable Probability**

In the previous lecture the focus was on the binomial distribution where probabilities of single events are equal. Here we focus on the binomial distribution where single-event probabilities are unequal. i.e. the probability of choosing a particular type of coloured ball is no longer 50%.

By completing the exercises, you will solve more complex probability problems than previously. These problems require you to calculate multi-event probabilities where probabilities of single events are not equitable.

**Lecture 5 Introduction to Information Theory**

In this lecture you will learn the mathematical basis of information theory. In particular, the mathematical definition of entropy (as used in the course) is defined.

By applying the exercises, you will understand the entropy function and how its value changes as a function of probability. You will identify that the entropy is 0 for probability values of 0 and 1.

**Lecture 6 Relative Entropy**

The ‘pure’ form of entropy is based on equitable probabilities. Relative entropy is based on non-equitable probabilities. The implication of this approach is that one can use “prior probabilities”. In contrast, entropy uses “ignorant priors”. You will therefore learn that you can use available information as a prior, which in turn leads to more plausible solutions (than not using available information). This is the main reason why the course uses the expression ‘Information theory’ rather than “maximum entropy”.

In the exercise you will recognise how the relative entropy function is slightly different to the entropy function. In particular, the relative entropy is maximised when the probability equals the prior probability.

**Lecture 7 Bayes Work**

Most probability theory is based on forward modelling. That is, given a scenario, what will be the outcome? Bayes considered the problem of probabilistic inverse modelling. That is, look at the outcome to determine what was the initial scenario.

**Lecture 8 – Simple Application of Entropy – Dice throwing.**

In this lecture you will be given a simple application. That application is a biased dice. Using* Excel Solver* and information theory you need to calculate the most likely probability of a ‘six’ (given the dice is biased). The lecture introduces Lagrange multipliers for constrained optimisation.

**Lecture 9 Statistical Concepts**

Entropy is fundamentally based on the binomial distribution. The most common distribution used in statistics is the normal distribution (or Gaussian distribution). An understanding of how the binomial distribution relates to the normal distribution is of value. If we use the normal distribution as an approximation to the binomial distribution and then consider the entropy of this normal distribution, we derive a simple quadratic function. A quadratic function has advantages such as, in some cases, it is easier for *Excel* *Solver* to determine the maximum entropy for a constrained problem.

Here we compare the two approaches: binomial versus normal; and consider the strengths and weakness of the two approaches.

In the exercise you will chart the entropy as a function of probability for both the binomial distribution and the Gaussian distribution, and you will understand that for small departures from the prior, the two methods are very similar.

**Lecture 10 Comparison of Maximum Likelihood Theory to Entropy**

Maximum likelihood theory is largely a statistical concept whereas entropy is a probabilistic concept. Yet for some applications, the solution is focused on a maximum likelihood approach where an entropy approach could have been used as an alternative. Hence in this lecture you will learn the difference between the two approaches.

*Section 2 The Thermodynamic Perspective of Entropy*

Entropy as a word was first defined by physicists. Consequently, there is considerable confusion as to whether an understanding of entropy implies one should understand thermodynamics. For this reason, this course is defined as “Information Theory” rather than maximum entropy. However, entropy is used in thermodynamics – and is indeed related to entropy as used in information theory.

In this section the use of entropy by physicists is discussed, as well as how its use in thermodynamics relates to wider applications.

This section provides an introduction into the use of entropy for thermodynamics (and the basis for statistical mechanics) but is not a comprehensive course in thermodynamics.

**Lecture 11 Thermodynamic Perspective**

This lecture largely focuses on the historical basis of entropy as used in thermodynamics. It leads to the emergence to the branch of mathematical physics called statistical mechanics.

**Lecture 12 Calculate Molecular Energy Distribution.**

In this lecture the mathematical basis of molecular energy distributions is presented.

**Lecture 13 Practical Application of Thermodynamics**

In this lecture you will calculate the molecular energy distributions using *Excel*. You will solve the problem using *Goal Seek*.

**Lecture 14 Reconciling Thermodynamics with Information Theory.**

This lecture summaries the thermodynamics using the information theory approach – with the intent that you will understand that ‘entropy’ in this course is a probabilistic concept only – and whilst IT has application to thermodynamics it does not require a deep understanding of thermodynamics to solve the many problems in information theory.

*Section 3 Mineral Processing Applications*

The lecturer was a researcher in mineral processing for some 20 years. He recognised that many mineral processing problems could be solved using information theory. In this section a set of mineral processing problems are given:

Comminution

Conservation of assays

Mass Balancing

Mineral To Element

Element to mineral

The section also discusses advanced methods such as:

collating entropy functions in a VBA addin,

and using the logit transform to enable Excel Solver to solve problems that it would otherwise not be able to solve.

Although the applications are mineral processing the applications will be understandable to you even if unfamiliar with mineral processing.

**Lecture 15 Overview of Examples**

The course now focuses on a wider set of applications than thermodynamics. Here an overview of applications discussed in the course is given. The lecture also discusses some applications of information theory that are not within the scope of the course.

There are four groups of applications:

Thermodynamics

Mineral processing (which is also relevant to chemical engineering)

Other examples (such as elections)

Games and puzzles

**Lecture 16 Application to Comminution**

The first mineral processing example is comminution (which is breakage of particles).

In this lecture entropy is used as a predictive methodology to estimate the size distribution of particles, if particles are broken to a known average size.

**Lecture 17 Comminution Exercise**

In this lecture you will apply the methods of the previous lecture using *Exce*l. You will learn what is meant by a feed size distribution and what is meant by a product size distribution. The product size distribution can be considered a probability distribution, and the product size distribution before breaking particles to a smaller size is considered as a prior distribution. Hence the approach uses relative entropy.

**Lecture 18 Conservation of Assays**

In mineral processing, there are ‘assays’ which are measures of either the elements or minerals. Here we focus on elements.

In this lecture you will perform a simple mass balance (or reconciliation) to ensure assays are conserved.

That is, when the particles are broken to a smaller size, if the assays for each size-class are considered the same as in the initial product size distribution then there is an inconsistency in the overall content of mineral. That is the assays need to be ‘reconciled’. Using information theory (relative entropy) the assays of the product are adjusted.

This is an example of applying information theory as a number of subproblems. Subproblem 1 is adjusting the size distribution; Subproblem 2 is adjusting the assays.

**Lecture 19 Creating Entropy Addins**

This far, all formulae are inputted via front-end *Excel*. However as the *Excel* functions become more complex it is both tedious and inefficient to type them in from scratch. In this lecture you are shown functions that are compiled within an addin, and then how to install them in a workbook.

If you are familiar with *VBA* you can explore the functions in detail.

**Lecture 20 Mass Balancing**

Mass balancing (or data reconciliation) is the method used in chemical engineering and mineral processing to adjust experimental data so that they are consistent. In this lecture you will learn how this problem can be solved using information theory. And how this approach is easier than the conventional method.

**Lecture 21 Mass Balancing Excel**

Using *Excel*, you will seek to solve a mass balance problem; and it will be shown that *Excel* *Solver* is not able to solve the problem as is. It may seem strange to have a lecture where the exercise does not solve the problem satisfactorily. However from this exercise you will learn more about the strengths and weaknesses of *Excel Solver*. In later lectures you will learn how to adjust the problem so that the solution can be determined using *Excel Solver*.

**Lecture 22 Logit Transform**

The logit transform is a method to convert the probability from the region [0 ,1] to a variable in the region [-∞, ∞]. Conversely the inverse logit transform converts a variable from the region [-∞, ∞] to the region [0 ,1]. This transform allows probability problems to be more easily solved using *Excel* *Solver*.

**Lecture 23 Logit Transform Exercise**

You will recontinue the problem given in Lecture 21 by using the logit transform.

The transformed variables are called Z, and these variables are adjusted via *Excel* *Solver* to maximise the entropy.

**Lecture 24 Mineral To Element**

Mineral to element is not specifically an application of information theory – it provides the basis for the latter information theory problem of element to mineral.

In the exercise you will take a distribution of minerals and from that calculate the distribution of elements. You will learn how to do this using two methods:

1. Conventional Sumproduct

2. Using matrix equations.

The second step is mainly suitable for people with mathematical knowledge of matrices; but the exercise is still suitable for you if you have not had previous exposure to matrices.

**Lecture 25 Correcting Mineral Compositions based on Assays**

This lecture focuses on the inverse problem of Lecture 24. That is using elements to calculate minerals. Here the participant is reintroduced to the issues of problems being:

Ill-posed (non-unique solution)

Well-posed (unique solution)

Over-posed (more information than is required).

The problem of element to mineral (as given in the exercise) is an ill-posed problem, and information theory is applied by using any previous mineral distribution as a prior.

In the exercise you will calculate the mineral composition given the elements.

*Section 4 Other Applications*

In this section there are various lectures which are diverse and could not be pooled as distinct sessions. Although there is focus on these applications there are a number of new concepts introduced:

The use of weighting by number

Dealing with network systems.

**Lecture 26 Elections**

Elections can be considered as a probability problem. Consequently, if we can predict the general swing – we can use information theory to estimate the results for each electorate.

**Lecture 27 Elections Exercise**

In this exercise you will apply the knowledge of the previous lecture to predict the outcome of an election. You will learn further skills using *Goal Seek.*

In the exercise you will identify the required swing in support for a party to win the election. You will also predict the results for each electorate. However, there is a problem with the way entropy is used in the exercise which is discussed in later lectures.

**Lecture 28 The Problem of Weighting**

For entropy problems thus far, the entropy has largely been based on the presumption that the total number can be ignored. In this lecture it is explained that ignoring the number is a fallacy. The exercise demonstrates that by incorporating the number in the entropy more realistic solutions can be obtained.

**Lecture 29 Weighting Exercise**

You will return to the Mass Balance problem and ensure that the entropy is weighted by solid flow, yielding results that are more plausible.

**Lecture 30 Network Problems**

A network problem is one where the problem can be visualised by a flowchart and objects go through various processes.

It is shown that information theory has a paradox when it comes to network problems. To overcome the problem a new syntax is presented where variables are differentiable in some equations – but not differentiable in other equations.

In the exercise you will understand the paradox via a simple network problem.

*Section 5 Games and Puzzles*

In this section we apply information theory to games and puzzles. Currently this section focuses on 2 games: Mastermind and Cluedo. You will identify that these 2 games are completely suitable for information theory and you’ll set up solutions to these games as probabilistic systems.

**Lecture 31 Mastermind - explanation**

Mastermind is a logic game for two or a puzzle for one. It involves trying to crack a code. In this lecture the puzzle is explained so that successive lectures can focus on using information theory to solve Mastermind.

In this exercise you will play the game mastermind in order to be familiar with the game for later lectures.

**Lecture 32 Applying Information Theory to Mastermind**

Mastermind is a perfect example of the application of information theory. Here you will learn the methodology for solving Mastermind. The puzzle involves using ‘trials’ from which information is obtained about a hidden code. Information theory is applied using the new information to update the probabilities.

In the exercise you will set up the Mastermind problem using a probabilistic system.

**Lecture 33 Mastermind -Information Theory Exercise**

Having set up Mastermind using a probabilistic system in *Excel*, you will use *Excel* *Solver* to identify the solution.

**Lecture 34 Playing Cluedo using Information Theory**

Cluedo is a ‘murder’ board game – but is based on logic. You will learn how to use information theory to solve the murder before any other player. The exercise only focuses on setting up the problem as the solution approach is analogous to Mastermind.

*Section 6. Close*

You will be reminded of what was covered in the course. You will consider further applications. You will learn the background to the course and mineral processing problems suited to Information Theory. Future and potential courses will be discussed.

**Lecture 35 Closing Remarks**

In this lecture a brief summary of the course is given, as well as acknowledgements and references.

## Who this course is for:

- This course is designed for: mathematicians seeking to learn information theory, physicists who are specifically interested in the relevance of entropy to thermodynamics, mineral processors and chemical engineers who are interested in modern analytical methods, machine learning experts who are interested on a machine learning approach based on information theory.
- The course is particularly suited to graduating high school students seeking to enrol in Engineering or quantitative sciences.

## Instructor

Dr Stephen Rayward is the main instructor. Stephen has a diverse science, mathematics, engineering and software development background. Stephen has 40 years’ experience in Mathematical Modelling, Engineering and development of both commercial and research software. He provides courses primarily in simulation internationally. He is the author of some 60 refereed papers, and some 20 LinkedIn articles; and he is well-known for his approachability, enthusiasm and considered views. He has developed various commercial software packages. Whilst Stephen used *Excel* in his professional research career, he was introduced to a complex workbook only about 10 years ago. Stephen was asked to convert the workbook into something more manageable, and started this task by creating a flowchart. Stephen quickly realised that the *Excel* workbook was unstructured and difficult to follow. He also realised that although *Excel* had lots of great functionality, it was also limited particularly in respect to creating a flowchart.

Stephen decided to branch out independently (forming his Company Midas Tech – MIDAS being an acronym for Mining Industry Data Analytics Service) and worked for numerous Companies, and simultaneously started developing his own *Excel* addins and commercial software. Stephen was invited to give courses internationally (both in mineral processing and *Excel*). That part which was *Excel* was generally labelled as “*Professional* *Excel*”, and Stephen’s logical and structured approach to *Excel* was labelled as Excel Engineering.

Stephen’s *Excel* courses are generally targeted to professionals who use *Excel* on a regular basis.

LinkedIn profile: Stephen Rayward