Graph Theory Introduction

A free video tutorial from William Fiset
Google engineer; ACM-ICPC world finalist
Rating: 4.7 out of 5Instructor rating
2 courses
113,180 students
Graph Theory Introduction

Learn more from the full course

Graph Theory Algorithms

A complete overview of graph theory algorithms in computer science and mathematics.

09:02:45 of on-demand video • Updated July 2020

Storage and representation of graphs (networks) on a computer
Common graph theory problems
Breadth first search algorithm
Depth first search algorithm
Various tree algorithms including: the height or a tree, finding the center of a tree, rooting a tree, and etc...
Dijkstra's algorithm
Topological sort algorithm
Shortest/longest path on a acyclic graph
Bellman Ford's algorithm
Floyd-Warshall all pairs shortest path algorithm
Finding bridges/articulation points
Finding strongly connected components (Tarjan's)
Travelling salesman problem (TSP)
How to find the maximum flow of a flow graph
Finding bipartite graph matchings
Various network flow algorithms including: Edmonds-Karp, Capacity Scaling, and Dinic's algorithm
Kruskal's Minimum Spanning Tree algorithm
The Lowest Common Ancestor (LCA) Problem
English [Auto]
Hello and welcome. My name is William and I'm super excited to bring to you this video series focused on graph theory. Graph theory is one of my absolute favorite topics in computer science. We're going to see a lot of very awesome algorithms. The whole field is very diverse and hugely applicable to real world applications. I think everybody should be able to learn love and enjoy Graph theory. These first few videos are going to be a ramp up videos to introduce the topics of how we store, represent and traverse graphs on a computer. By the way, this whole video series will be taking on a computer science point of view of graph theory rather than a mathematical one. So we won't be covering proofs and so on per se. Instead, we'll be looking at algorithm implementation details and code. So what is graph theory? In essence, it is the study of properties and applications of graphs which commonfolk or non mathematical folks call networks. This is a very broad topic and my goal with this video series is to teach you how to apply graph theory to real world situations. Graphs can be used to represent almost any problem, which makes them so interesting because they pop up absolutely everywhere. A simple problem that can be phrased as a graph theory problem might be given the constraints in this picture, How many different sets of clothes can I make choosing an article from each category? Of course, this could be phrased and solved using only mathematics, but the advantage to graph theory is that it allows us to visualize the problem using nodes to represent an article of clothing and edges to represent relationships between them. Another canonical example of a graph theory problem is a social network of friends. A graph representation enables us to answer interesting questions such as how many friends does person X have or how many degrees of separation are there between person X and person Y? Now we have to talk about different types of graphs. There are many different types of graph representations and it's really important, I mean, really important to be able to recognize what type of graph you're working with, and especially when you're programming and trying to solve a particular problem. This first type of graph is an undirected graph. It's the most simple kind of graph you'll encounter, and it is where edges have no orientation. That is, if there's an edge from Node U to Node V, it is identical to the edge from V to U. For instance. In the following graph nodes are cities and edges represent bidirectional roads, since if you drive from one city to another, you can always retrace your steps by driving the other way. In contrast to undirected graphs, there are also directed graphs, sometimes called die graphs. In these graphs, you've guessed it the edges are directed. So if we have an edge from U to V, then you can only go from node U to node V, not the other way around. In this graph you can see that the edges are directed because of the arrowheads on the edges between nodes. This graph could represent people who bought each other gifts. So an incoming edge represents receiving a gift and an outgoing edge represents giving a gift. Therefore person e in this graph bought person d a gift person a bought themselves and person B a gift and person F bought nobody any gifts and received none. So far we've only seen unweighted graphs, but edges on graphs can contain weights to represent arbitrary values such as cost, distance, quantity, you name it. Weighted graphs come in both directed and undirected flavors. As a side note, I will usually denote an edge of a graph as a triplet u v w to indicate where an edge is coming from, where it's going to, and what its weight is. Of course, with this notation I also need to specify whether the graph is directed or undirected. Next up, I just want to have a quick chat about special types of graphs and graph theory. There are so many different types of graphs that I only had to select a few, which will be most relevant for this upcoming video series. The most important type of special graph is definitely the tree. A tree is simply an undirected graph with no cycles. There are several equivalent definitions of a tree, such as a graph with n nodes and n minus one edges. All the graphs below are trees. Yes, even the leftmost one, since it has no cycles. A related but totally different type of graph is a rooted tree. The distinction here is that a rooted tree has a designated root node where every edge either points away from or towards the root node. When edges point away from the root node, the graph is called an arborescence or an out tree and an anti arborescence or entry. Otherwise, out trees are by far more common than in trees. From what I've observed. It is also fairly common for people to refer to a rooted tree simply as a tree instead of an in or out tree. But there is an important distinction there. Next are directed acyclic graphs. These are graphs with directed edges and no cycles. These graphs are very important and fairly common in computer science, actually, since they often represent structures with dependencies such as a scheduler, a build system, a compiler maybe, or perhaps more relatable university class prerequisites. There are several efficient algorithms we'll be looking at to deal specifically with directed acyclic graphs such as how to find the shortest path and produce a topological ordering of nodes. A topological ordering of nodes is an ordering of nodes that tells you how to process the nodes in the graph so you don't perform a task before first having completed all its dependencies. For example, a topological ordering of class prerequisites would tell you to take intro biology and intro chemistry before taking a class on, say, genomics. This next type of special graph is a bipartite graph. It is one whose vertices can be split into two independent groups U and V, such that every edge connects between U and V. This is just a fancy way of saying that the graph is two colorable or that there are no odd length cycles in the graph. Often a problem we like to ask is what is the maximum matching we can create on a bipartite graph? Suppose white nodes are jobs and red nodes are people. Then we can ask how many people can be matched to jobs. In this case, there are a lot of edges in each graph. So I think the answer for both is four. But in general, it's not so easy. If there are less edges, tougher constraints and more conflicts. Bipartite graphs also play a critical role in the field of network flow, which we will talk about later. This last type of graph is a complete graph. It is one where there is a unique edge between every pair of nodes in the graph. A complete graph with n vertices is denoted as the graph k sub n. I have listed k one through k six on the bottom and you can easily see how this scales when we add more nodes. Complete graphs are often seen as the worst case possible graph you can possibly encounter because of how many edges there are. So if you want to test your algorithm for performance, a complete graph is an easy way to start. One thing we're going to have to be really cognizant about is how we're actually representing our graphs on the computer. This isn't just what type of graph it is, but what type of data structure it. But what type of data structure are we representing our graph with? And this can have a huge impact on performance. The simplest way is inside a D adjacency matrix. The idea is that the cell m i j represents the edge weight of going from node I to node j. So in the graph below there are four nodes, so I create a 4x4 matrix and populate the graph with the edge weights. If you look at the edge weight from node C to Node D, you'll see that it has an edge weight of two. So in row three and column four of the matrix, there is a value of two. Note that is often assumed that the edge of going from a node to itself has a cost of zero, which is why the diagonal of the matrix has all zero values. This matrix form has several advantages. First, that it's very space efficient for dense graphs. That is graphs with a lot of edges. The edge weight lookup can be found in constant time, which is quite nice. And lastly, I would argue that it is the simplest form of graph representation you can have. On the downside, however, the main reason people don't go for the adjacency matrix as their first pick is because it requires V squared space, which is well, a lot of space in practice. Graphs with 10,000 nodes or more start to become infeasible very quickly. The other issue with the adjacency matrix is that it requires V squared work to iterate over all the edges of your graph. This is fine for dense graphs with lots of edges, but it isn't so great for sparse graphs since most cells will be empty. The main alternative to the adjacency matrix is the adjacency list, which is a way to represent a graph as a map of nodes to a list of edges. The idea is that each node tracks all of its outgoing edges. For example, Node C has three outgoing edges, so the map entry for C will track the edge from C to A with cost for the edge from C to B with cost one and the edge from C to D with cost two. Notice that in the list of edges we only need to track two things the node we're going to and the cost to get there. We don't need to keep track of where we came from because that's already implicitly known. The nice thing about adjacency lists is that it is great for sparse graphs because it only tracks the edges that you have and doesn't allocate additional memory that you might not use, like the adjacency matrix does. This also means it's efficient when iterating over all the edges. The main disadvantage to using an adjacency list is that it is less space efficient on denser graphs. Another subtle disadvantage is that it takes big o of e time to access a specific edges weight, although in practice you rarely or if ever, actually need to do this. The last representation I want to talk about is the edge list. An edge list is a way to represent a graph simply as an unordered list of edges. Basically, it's exactly what it sounds like. A list of edges. Assume that the notation for any triplet u v. W means the cost from node u to node V is w. So for this graph, the edge list is simply a list of six edges represented as those triplets. This representation is very simple. However, it lacks structure and that's why it is seldomly used. Advantage to the edge list is that it's great for sparse graphs. Iterating over all the edges is super easy and the structure is simple. The downside is that edge lookup can be slow and you can run into memory issues on large graphs.