# Information Theory (IT) as a basis for Machine Learning (ML)

## What you'll learn

- Automate Excel Solver for different trials – which in turn requires learning and using Named Ranges.
- For Visual Basic for Applications (VBA) developers understand the VBA code required to automate Excel.
- To understand how Metropolis-Hastings can be applied to identify probable solutions.
- To understand Simulated Annealing.
- To learn how to apply Simulated Annealing (together with Metropolis-Hastings to identify a probable feasible solution.
- To ascertain how all these techniques are combined to create a Machine Learning strategy.

## Requirements

- Participants should have access to a computer with Microsoft Excel.
- Strong Excel skills are not required but are encouraged.
- Use of a Mac-based operating system is unsuitable.
- The course uses two courses as prerequisites: 'Practical Introduction to Information Theory' and 'Geometric Introduction to Markov Chain Monte Carlo'.
- The instructor aimed to make the course understandable without the prerequisites. However those who would like the most complete understanding of the course are encouraged to complete the prerequisites. However Practical introduction to Information Theory is mainly suitable to students with strong High school mathematics.

## Description

Machine Learning is a modern branch of mathematics and data science. To some extent, Machine Learning is not well-defined. Here Machine Learning means an algorithm that can adapt to find better estimates of scenarios using available information.

Similarly Information Theory can be considered as a new branch of mathematics, and one approach of applying Machine Learning is to use Information Theory as the basis.

Hence the objective of the course is to explain the basis of a Machine Learning approach using Information Theory.

The game Mastermind is used as the focus. Hence a sub-objective is to demonstrate and explain how the game Mastermind can be solved using a Machine Learning (ML) approach.

The methodology links the following mathematical techniques:

Representation of a problem as a probabilistic system,

Identifying the solution to the probability system using Information Theory, and using as the software basis for this step

*Excel Solver*.Identify feasible solutions using Markov Chain Monte Carlo (MCMC) – particularly the Metropolis-Hastings algorithm (MH) together with Simulated Annealing.

**SECTION 1 INTRODUCTION**

You will learn what the overall structure of the course is. You will learn what is meant by Machine Learning, and get a broad understanding of how Information Theory can be used as the basis of Machine Learning. The course is not intended to be a broad Machine Learning course – rather a focused Machine Learning course specifically considering the approach of using Information Theory as a basis.

**Lecture 1 Outline of course**

The user will understand the problem that is to be solved using Machine Learning. This problem is solving a hidden sequence. The game Mastermind encapsulates this problem. You will review previous related Udemy course on Information Theory with Mastermind being an example.

**Lecture 2 Introduction to Machine Learning Concepts**

You will learn what is meant by Machine Learning; and will learn a brief history of Machine Learning. Machine Learning is a very broad term and this course focuses on a particular strategy of Machine Learning which is a probabilistic approach. The course is not a broad Machine Learning course.

**Lecture 3 Comparing Information Theory and conventional Machine Learning methods**

There are many Machine Learning methods. Information Theory is one approach. You will get an overview understanding of how Information Theory is applicable to Machine Learning. In this lecture you will learn the advantages of the Information Theory approach from a holistic viewpoint.

**Lecture 4 Setting up the Excel workbook**

The course mainly focuses on using one workbook. This workbook uses addins. You will learn how to enable macros so that the workbook can run the command buttons, and also enable referencing to required addins.

**SECTION 2 MACHINE LEARNING APPLIED TO MASTERMIND **

This section provides an overview of the problem to be solved; and the approach used.

You will review how Mastermind is more easily solved by collating the information in a probability system. This probability is the probability of digits in particular positions.

In the game Mastermind, one tries to determine a hidden code, also called a hidden sequence. The player creates a ‘trial’ sequence, and the trial is scored based on how much it agrees with the hidden sequence. The Game is set up in *Excel* using 4 positions and 9 digits, so the number of possible solutions is 9*9*9*9.

Using the scores the user keeps estimating trials until they identify the hidden sequence.

Here, instead of the player creating the trials, a computer algorithm is created to guess the trials.

In the solution, a command button in *Excel* is created called ‘Move’. The basis of ‘Move’ is to evaluate the various trials and available probability to determine a probable feasible solution, which becomes the next trial.

The probability can be used to estimate a probable solution. However, the estimated probability is an estimate only for each digit in each position rather than a probability of a full sequence.

Therefore, other methods must be integrated into the ‘Move’ command – and these include Markov Chain Monte Carlo (MCMC) and Simulated Annealing.

**Lecture 5 Mastermind example**

In this lecture, you will learn how Mastermind is solved using a Machine Learning approach. The lecture provides an overview only showing the *Excel* workbook and the method of solution by pressing Command Buttons.

In later lectures, we explain the details of the algorithms.

**Lecture 6 Mastermind as an example of Machine Learning (Schematic explanation)**

In this lecture you will visualise the methodology used to solve the Mastermind problem and how it relates to Machine Learning. You will also learn the benefits of drawing a flowchart of Machine Learning process. In this way the process can be very simply understood by end-users.

**Lecture 7 Methods used to solve Mastermind using Machine Learning.**

Definitions of Machine Learning are quite broad. Hence the instructor defines Machine Learning as appropriate for the game Mastermind.

Here Machine Learning is defined as a computer approach that iteratively uses available information to estimate something that is otherwise unknown.

Solving the game is a **process**, the end objective is to estimate the hidden sequence.

By using a trial, a score is created which gives us information about the hidden sequence, a scenario. The scenario is used by the ML algorithm to make a decision, and then applies that scenario to the process to get more information.

You will review how Mastermind is more easily solved by collating the information in a probability system. This probability is the probability of particular digits in particular positions.

Through schematic explanations, you will learn how a solution to Mastermind is an example of Machine Learning. In the solution, a command button in *Excel* is created called ‘Move’. The basis of ‘Move’ is to evaluate the various trials and available probability to determine a probable feasible solution, which becomes the next trial.

The probability can be used to estimate a probable solution. However the estimated probability is an estimate only for each digit in each position rather than a probability of a full sequence.

Therefore other methods must be integrated into the ‘Move’ command – and these include Markov Chain Monte Carlo (MCMC) and Simulated Annealing. This command is the focus of the Machine Learning course.

Indeed the specific Markov Chain Monte Carlo technique is called the Metropolis-Hastings algorithm.

**Lecture 8 Mastermind exercise**

In this lecture, you will learn how Machine Learning is used to estimate a probable reasonable trial. You are invited to determine if you can solve Mastermind better than the algorithm. By doing so the user will learn that well-constructed Machine Learning algorithms can be more efficient than a user. The approach of Machine Learning could arguably be called artificial intelligence although this term is not the subject of the course.

**SECTION 3 AUTOMATING EXCEL SOLVER **

*Excel Solver* is used to update the probabilities after each trial. Solver is automated so that it changes the data it is applied to on each new trial. In this section you will learn how to automate Solver.

This section would be of most interest to advanced mathematical software developers who use *VBA* in *Excel*.

**Lecture 9 Setting up Excel Solver**

*Excel* *Solver* is certainly not the best optimisation algorithm available. However it is available to all *Excel* users as-is, and therefore it is worth using if it can be successfully used.

Solver has limitations, one way to overcome the limitation is to use the ‘logit’ of a variable, which is to be adjusted.

In this lecture, you will review how Solver is set up for estimating the probabilities.

*Solver* is not used for the ML ‘Move’ command.

In this lecture you will learn how *Excel* *Solver* can be applied to data defined by Named Ranges rather than A1 notation. The use of Named Ranges makes it much simpler for the end-user to understand how Solver is being applied to the data.

**Lecture 10 Some Solver Limitations**

Unfortunately, the instructor could not use *Excel* *Solver* as they intended. Instead, some workarounds had to be used. You will learn some of the limitations of *Excel* *Solver* and how these problems can be overcome.

**Lecture 11 Applying Named Ranges to Solver**

Named Ranges are a convenient way to use *Excel* equations and methods rather than the more obtuse A1 notation. In this lecture, you will learn how to create Named Ranges that can be applied to *Excel* *Solver*.

In the algorithm the Named Ranges are dynamic in that some of them extend when new trials are created. This dynamic extension is achieved using *VBA*.

**Lecture 12 Automating Solver using VBA**

It was originally hoped that if the cell address referred to by a Named Range changed then *Excel* *Solver* would use the updated Named Ranges. This did not happen. Instead *VBA* code had to be created to automate *Excel* *Solver* to handle changes in the Named Ranges.

This lecture is of particular interest to *VBA* developers with an interest in mathematical optimisation.

**SECTION 4 METROPOLIS-HASTINGS APPLIED TO MASTERMIND **

You will learn that there are two major algorithms in Markov Chain Monte Carlo. These are the Metropolis-Hastings algorithm (MH) and Gibbs sampling. In this course, MH is used together with Simulated Annealing.

**Lecture 13 Review of Metropolis-Hastings**

The Metropolis-Hastings algorithm is used to choose new sequences – using any previous sequence as a starting point. The new sequences are called test sequences.

MH ensures that new sequences are preferentially selected based on maximising the available probability function.

However moving toward the position where probability is maximised does not guarantee a feasible solution – so preference is also given to test sequences that are more feasible.

**Lecture 14 Simulated Annealing**

Simulated Annealing is an optimisation strategy that is suitable for the Mastermind game. This is because Mastermind is a multidimensional problem, and the available probability function does not incorporate all constraints.

In this lecture you will learn some of the main concepts underlying Simulated Annealing. These include:

Comparison of Simulated Annealing with the hill climbing algorithm (also called steepest descent).

Graphic explanation of Simulated Annealing.

When to use Simulated Annealing.

**Lecture 15 Application of Simulated Annealing to Mastermind**

You will learn the concept of how Simulated Annealing is applied to the game Mastermind. An important concept is understanding the difference between a test sequence and a trial sequence. We apply Simulated Annealing to various test sequences in order to obtain a trial sequence.

Generally Simulated Annealing is applied to the objective function. Although in this case Simulated Annealing is applied to the constraint, the constraint is that the test sequence must be feasible.

A comparative score is applied to the test sequences which is a measure of its feasibility.

This is effectively the same as saying the objective function is the measure of feasibility.

**Lecture 16 Comparative Score**

In this lecture the Comparative Score is discussed in more depth. The Comparative Score is the input to the methodology based on a combination of Simulated Annealing and Markov Chain Monte Carlo.

Here you will learn how the Comparative Score is defined with a score of 0 meaning it is feasible, and larger scores mean larger deviation from feasibility.

The Comparative Score is applied to each Trial (with respect to a Test). The Total Comparative Score is the sum of the Comparative Scores over all Trials (with respect to a Test).

**Lecture 17 Applying Metropolis-Hastings to Mastermind**

In this lecture you will learn how MH is applied. The methodology is very simple and involves changing the digit in a position one at a time.

You will learn the basic algorithm to determine if a new sequence is accepted or rejected based on a random probability (biased toward test sequences that have higher probability).

By combining MH with Simulated Annealing a highly probable feasible sequence is selected.

In this lecture we simply rerun the complete sequence of command buttons to perform a Machine Learning solution to the game Mastermind.

You are invited to see if you can ‘beat’ the ML algorithm by using your logic and intuition. It is expected that you will therefore learn that ML algorithms can provide solutions with less trial sequences than using independent logic.

It can be argued that the algorithm is an example of Artificial Intelligence – however this term has largely been ignored in this course.

**SECTION 5 CLOSE **

You will review the key concepts that you have learnt.

Solving the game Mastermind by using a probabilistic system.

Applying Information Theory to estimate the probability.

Applying the Metropolis-Hastings algorithm to seek a probable solution.

Applying Simulated Annealing to gain a feasible solution.

Applying all the above methods to obtain a feasible probable solution.

To consider the overall approach as a Machine Learning strategy.

**Lecture 18: Conclusions**

Description: as above.

**SECTION 6 OTHER APPLICATIONS **

In this section, other applications that could be solved using similar strategies are discussed. The applications are divided into ‘game and puzzles’ and ‘science/engineering’.

However when we mean ‘similar’ strategies we don’t mean precisely the same strategy as:

Creating a probabilistic representation of the problem,

and linking Information Theory,

Metropolis-Hastings algorithm

and Simulated Annealing.

We simply mean: creating a probabilistic system; inferring information about the probability system, using this information to make a decision, and then obtain some feedback, which is then used to update our inference of the probabilistic system.

**Lecture 19 Applications to other games.**

The purpose of this lecture isn’t to solve a host of games and puzzles, rather to stimulate you to identify how the methods can be applied to games/puzzles. Games and puzzles discussed include:

Bridge

Battleships

Battleship Puzzles

Cluedo

Sudoku

Some of these problems can be solved using Information Theory alone. You will learn that all these games can be set up as probabilistic systems with information providing the constraints.

You are invited to suggest other examples.

**Lecture 20 Application to other science and engineering problems.**

There are many examples of using variations of the strategies discussed here. The primary steps are:

To set up a problem as a probabilistic system.

To apply Information Theory to estimate some scenario.

To test the scenario using some decision strategy.

The instructor’s main background is mineral processing problems.

Examples discussed include:

Tomography

Stereoscopy

Stereology/geometric probability

Mineral Processing Mass balancing

Mineral processing Simulation

You are invited to suggest other examples.

**Lecture 21**

## Who this course is for:

- This course will appeal to a wide variety of participants. The direct participants include the following:
- Those learning the fundamentals of Machine Learning.
- Those who are Machine Learning experts – who wish to broaden their knowledge of Machine Learning algorithms.
- Those with a general interest in mathematics.
- Those who are interested in the application of mathematics to games and puzzles.
- Anyone completing a major in mathematics at University.
- Mathematical researchers (particularly applied mathematicians and probabilists).
- VBA developers and Excel users wanting to know possible diverse applications. Although the course does not go into depth in VBA.
- High school mathematics teachers wanting to show diverse applications of mathematics.
- Indirect participants will include those for which this course will later be used as a prerequisite. This includes engineers such as Mineral Processing Engineers (who have an interest in mathematical modelling).

## Instructor

Dr Stephen Rayward is the main instructor. Stephen has a diverse science, mathematics, engineering and software development background. Stephen has 40 years’ experience in Mathematical Modelling, Engineering and development of both commercial and research software. He provides courses primarily in simulation internationally. He is the author of some 60 refereed papers, and some 20 LinkedIn articles; and he is well-known for his approachability, enthusiasm and considered views. He has developed various commercial software packages. Whilst Stephen used *Excel* in his professional research career, he was introduced to a complex workbook only about 10 years ago. Stephen was asked to convert the workbook into something more manageable, and started this task by creating a flowchart. Stephen quickly realised that the *Excel* workbook was unstructured and difficult to follow. He also realised that although *Excel* had lots of great functionality, it was also limited particularly in respect to creating a flowchart.

Stephen decided to branch out independently (forming his Company Midas Tech – MIDAS being an acronym for Mining Industry Data Analytics Service) and worked for numerous Companies, and simultaneously started developing his own *Excel* addins and commercial software. Stephen was invited to give courses internationally (both in mineral processing and *Excel*). That part which was *Excel* was generally labelled as “*Professional* *Excel*”, and Stephen’s logical and structured approach to *Excel* was labelled as Excel Engineering.

Stephen’s *Excel* courses are generally targeted to professionals who use *Excel* on a regular basis.

LinkedIn profile: Stephen Rayward