
This course is indended for beginner bioinformaticians. It covers the topic of dot plots, starting with a basic introduction to what dot plots are and how a simple dot matrix is generated. The lectures thereafter gradually progress towards more advanced methods with numerous examples explaining each new concept that is introduced, for example, the use of sliding windows and substitution matrices. The initial lectures show how dot plots can be used to identify inserts and deletes, repeat regions, compositional bias, and domain structure in protein sequences.
When applied to the study of genome sequences, the algorithms used by modern dot plot software are more complex, since analysis of very long sequences – consisting of up to few billion symbols – is far too slow unless the algorithms are cleverly designed to be very fast. Recently published dot plot software for genome sequence analysis therefore relies on advanced concepts such as suffix arrays and minimizers. The course attempts to explain these algorithmic ideas in a reasonably easy way, to make them accessible to students regardless of academic background.
The final lectures of the course present application examples from research articles and show step-by-step how the dot plots shown in the journal articles can be reproduced by the student. At the very end, the final lecture gives an overview of the history of research on dot plot algorithms, from the 1970s to the present day. This gives a historical perspective of how algorithms have advanced from the time when short sequences of a few hundred symbols were painstakingly analyzed by the slow computers of that time (or even by hand with pen and paper!) to the powerful dot plot tools used today, that can compare medium-sized genomes in a matter of seconds or two eukaryote genomes containing over a billion symbols in a few minutes.
No programming skills required. All software demonstrated in the lectures can be accessed online through web sites or installed on your computer, so that you can reproduce all examples shown in the lectures.