How do a databases work and why are they important?
A free video tutorial from Rares Ilea
Web Applications Developer | IT Consultant
4.4 instructor rating • 2 courses • 24,700 students
Learn more from the full courseThe Complete Database Design & Modeling Beginners Tutorial
Learn Database Design the easy way. Go from simple to complex with a real life example: online store's DB using MySQL.
02:05:14 of on-demand video • Updated April 2019
- Learn what a database is
- Learn how databases work and why are they important
- Learn data modeling and the 3 levels of relational database design
- Learn what are the steps to create a sound database design
- Learn what database normalization is and how to apply 1NF, 2NF and 3FN in database design
- Learn how to build database relationships: one-to-one, many-to-one and many-to-many
- Understand better every theoretical step by following several concrete examples
- You will be able to design a relational database from scratch
- You will be able to answer the most common interview questions about databases
- You will have a concrete e-commerce database design schema to add to your portofolio
English [Auto] Hi and welcome back in this lecture we will explore how databases work and why are they important in modern computing. I go to some theoretical information about algorithms that might seem weird I promise you it is only in this lecture that I'll do this. First I thought of skipping this part but in order for you to understand the underlying mechanism of databases and answer probably one of the most common interview question how databases indexes work I had to show you this part books of hundreds of pages have been written on the subject but I'll focus only on the core concepts so that you'll get the main idea. I start off the discussion with some theory about the Big-O notation. B-trees and how these concepts are using databases to allow superfast retrieval of data so the Big-O notation is used to classify algorithms according to how they are running time or space requirements grow as the input size grows. This definition may not say much but you'll understand how this works by going through some examples. We we'll be focusing in measuring the time complexity. Let's start with constant time. Imagine we have this line of code that computes X not just that it does not depend on the input size in any way for this line of code. We say that its complexity is big off one or constant time. Now let's see a linear example. Suppose we had the following loop that prints the numbers from 0 2. And this block of code runs and times does we say that this block has the complexity go of. And finally let's look at quadratic time. You can see that the print statement will be executed and times and times that means the time complexity is B go off and squared so doesn't make sense how this Big O notation works for our purposes. This example should suffice. But if you find this topic interesting and want to take a deep dive on the Big-O notation you can find very useful information by doing a simple search on Google. We want to explore the most common algorithm complexities in computing. For example as we signed the previous slide printing an array of elements has the complexity big off and the best sorting algorithm has the complexity the goal of and time slogan. And so what does all this has to do with databases. Let's imagine we have a table of rows we want to search for a specific role was the complexity of that. While in the worst case scenario meaning that the role we are searching for is the last one we will have to iterate over each row until the end. Does the complexity of the search would be go off and as you can see this means that the search time increase is linear at least with the number of rows. Well this is not ideal for a database. What we want is to reduce the impact that size has over the search time. Now by looking at this complexity chart Wouldn't it be great if we could reduce the time of searching to a logarithmic complexity. The big show of Logo and that would imply that the search time is almost unaffected by the number of rows in the table that would be a major improvement. Fortunately we can by using a data structure called B-3 which is at the core of most databases implementations today. Let's explore how it works. So what does the B-3 a bit 3. It's a self-balancing 3 data structure that keeps data sorted and allows searches sequential access insertions and deletions logarithmic time. It is commonly used in databases and file systems. This definition may not say much but let's take an example to understand how it works. Let's imagine we have an array of numbers from 1 to 7. Now in this image you can see the steps needed to create a B-tree from the initial array. The end result is the last three from the picture. We say that these B-3 ease of order free because every node has at most three children. Now let's see what it would mean to search for elements 7 instead of doing seven reads from the initial array we could find it in just rereads how First we look at the root element which is 4. And that would be read number one. We see that and although we are searching for has a higher value so we continue on the right branch. We compare it to six. These will be read the number two still higher. We need to keep searching on the right branch. Finally we find it. And that would be the fourth and the last to read if you are a bit confused Don't worry we'll explore soon. Another example what we should keep in mind is that in the worst case the number of reads We must perform in order to find the node equals the depth of the tree. In our example the B-Tree depth is free and we perform free reads in order to get to the elements segment. This is a more complex B-tree of order and equals 5 meaning that every node can have at most five children. It can store for entries but not the three in this example has the depth of two and every node can store four entries. That means it can index four square equals 16 entries to generalize the number of entries of B-tree can index an equals to the number of entries Bernald raised to the power of D which is the that of the three having these formulae in mind. Let's see how much data a B-tree of order 5 can index in respect to its that as you can see at the depth of 10 it can index more than 1 million entries. As we noted previously the number of reads We must perform in order to get to the knowledge we are searching for equals the 3 that this means that searching for value in an indexed array of 1 million elements takes them reads a far cry from the one million plus reads that would have to be performed on a non indexed array. Now let's see how databases are using the power of B-trees. Here is an example of a database table storing data about employees. We aim to search employees by using a logarithmic time complexity. The solution implemented by most databases which is also their key ingredient is to index the employee ID column into a B-3. In this particular example we can find employee 46 in three steps. The first step is to read the root node and see on which branch to continue. Then we move on on the middle branch and find the ID 46 now to search for this idea in a beat. It takes just 2 reads. You may be thinking right the three that these two. Why did you say three steps. Well the note in the B-3 doesn't start the whole roll data it just thought its location on the hard drive. So we need an additional step step number three that reads the actual road. In conclusion to find data in a database it takes us D-plus one reads Where is the depth of the B-Tree. If you think of the previous slide searching for an employee in a table of 1 million employees would take us 11 reads before moving forward. I want to mention something if the searches were run by the first name then we would have to index the first name column. The B-tree would have looked similar but instead of numbers it would have contained letters. Indexing users by email addresses is common practice since that logging time you must search for them by email address. Now let's put all these pieces together. We said that in order to find a row in the database we need to execute D-plus one reads the total time to search this row in an entry or database table would take D-plus one multiplied by the time to access a block of data on the hard drive. That is a constant and it depends on the hardware equipment. It's rotations per minute or whether or not isn't SSD we just marked that with the HD access time. Next we can compute the depth based on the true order and the total number of entries so the equals log of base and minus one of the results that the time to such a table of and rolls is a logarithmic function of and and that's it. We are done enough with all this theory and mathematics. I hope you understand how databases work at the core level. You should not bother memorizing all this information but if during an interview someone will ask you how databases indexes work. You should tell them that B-trees are used to index table columns in order to reduce the search complexity from a big off and to a big show of Logo. I am on a side note in our examples we have used B-trees of order 5 starting for entries per node. In real life though B-trees store hundreds of entries pattern old making their dept lower and improving search times. Now let's take a quick example. Imagine our problem with 1 million employees and the hard disk access time of zero point two milliseconds. Our database will return results in 2.2 milliseconds versus a classic technique of searching for that row that will take one hundred thousand times more time probably by now you already got an idea why databases are so important of course is because they manage data so efficiently and as you'll see next are simple to design and to query to date. The internet is full of web applications so the database is on top of which they are built we can access useful information in seconds or less. You can think of Google that basically stops every page that is on the Internet and returns millions of results for a search in less than a second. You can think about Facebook that has 1.9 billion users and you can find everyone by their name in less than a second. You can think of Amazon baulking and many others. Before moving forward let's do a quick recap. In this lecture we learn about the big O notation. Then we talk about B-trees and their logarithmic search complexity. Then we so how they are used in databases to drastically improve the search time. Thanks for watching and see you in the next lecture.