DynamoDB overview

Sundog Education by Frank Kane
A free video tutorial from Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
4.5 instructor rating • 22 courses • 441,578 students

Lecture description

Let's cover DynamoDB from a high level, and learn how it might help us store our chat data more efficiently.

Learn more from the full course

Build a Serverless App with AWS Lambda - Hands On!

Create a chat web app using Amazon Web Services - Lambda, DynamoDB, API Gateway, S3, Cognito, CloudFront, and more.

07:22:26 of on-demand video • Updated January 2021

  • Build a chat application entirely with AWS services, and no stand-alone servers at all
  • Architect and design serverless applications
  • Serve static resources to browsers using AWS's S3 (Simple Storage Service)
  • Use Javascript to dynamically modify web pages
  • Manage permissions and access policies using IAM (Identity and Access Management)
  • Manipulate and vend data in the cloud using AWS Lambda
  • Use CORS to manage client-side security in serverless apps
  • Store and retrieve data with DynamoDB
  • Model API's and create SDK's with API Gateway
  • Create a user login system with Cognito
  • Speed up your web app with a CloudFront CDN
English [Auto] So far everything we've done has been Read-Only. And while it's very interesting it doesn't make for a very good chat application because you can't actually send a message to anyone right now. So in order to do that we need to think about how we're actually storing our data in S-3 is really not going to work for us. So we're going to move to Dynamo D.B in this section dynamo D-B is a no secret database that eight of us provides and we'll talk a lot about how it works and what no sequel database is and why we're using it. By the end of this section you'll actually be able to send a message using the chat apps so that's a pretty big step for us. So let's get started. So far we've been using S3 for our data storage but that's probably not the best choice given that we want to be able to update and read and change these values dynamically at large scale. So one popular option in AWOS is dynamo DB for this sort of thing. Let's go over what dynamo DB can do. Now one of the primary descriptions of Dynamo D-B is that it's a no sequel database. Now no sequel started off meaning non-sequel but as sequel style interfaces started to emerge and develop on top of these databases it became more of a not only sequel. So a bit of a retcon there. But basically these are not your classic databases. Most of them use a simplistic set of API to access the data rather than a complex query language and some have adapted to use sequel or subsets of sequel just for a familiar way to access the data and Dynamo D-B does have some of that but you're not going to be writing complex sequel statements with Joines and things like that generally speaking. Now most no sequel databases are really just big distributed key value stored systems Sakia their speed and the key is usually a string of some sort and determining how to create that key is a really important part of using non-sequel databases. The value part tends to be structured but not strictly So what that means is that the value is stored like a simple map but the keys and types within that map are not usually strongly enforced especially in Dynamo D-B the table schema has no information about the attributes of the item. It's really up to the client code to consistently treat attributes in the same way. Now most major databases have strong acid guarantees around transactions and if you're not a database geek Asad stands for atomicity consistency isolation and durability. And basically it means that when you make a transaction into a database you can be guaranteed that it's been written before you actually get a response back. Now really the key to know sequel's scalability is that it doesn't provide those guarantees. You know if you write data into it there's a really really really good chance that it will write it successfully but it might not be immediately you know and you have to be able live with that and often that's a valid tradeoff to make because the speed and the scalability that a no sequel approach gives you is worth giving up some of those acid guarantees. Generally speaking non-sequel databases provide parts of the guarantee but most of them really don't support transactions at all and certainly not across multiple tables. So while you would be able to write a bunch of data to multiple tables and committed all at once in my sequel or postgresql or Oracle or any other major sequel database most no sequel databases can provide none of these guarantees on more than a single record at a time if that. But again that's the price we pay for the scalability and speed of a no sequel approach now because no sequel databases are relatively simple and don't have transactions. They tend to scale really well. So for example there's no limit on the size of a dynamo divi table. You can also just provision a lot of read and write capacity on this essentially unlimited table just keep throwing more and more disk and more and more servers at it and you can scale this thing out to be as massive as you want to. Let's talk about tables in Dynamo D-B. So dynamo DB tables are key value records but they have a little more structure to them than that. There are three parts to any dynamo D-B record or item as they're called in the documentation for Dynamo D-B. One is the hash key and the hash key also known as the partition key is the first part of the key. You need to have a hash key in your table. If you only have a hash key then they have to be unique. Then you can have a sort key which is a child of the hash key. So within a hash key you can have a bunch of items with unique sort keys and then we have the data which is the rest of the record that contains the attributes for the record itself. Now additionally dynamo D-B has the ability to create secondary indices on tables and a secondary index exists on top of an existing table and it looks exactly like a table. It has a hash key a sort key and data. And you can choose what data to copy into the index. So if you need to access a table in multiple ways you'll want to create a secondary index and we will actually do that when we start using dynamo D-B in this course. Dynamo D-B has a bunch of different supported data types. They also choose somewhat of two names for these types so this table is basically your cheat sheet for helping us disambiguate the names with what they really represent. Several of these types have a set type. And since this is not a traditional relational database you can actually have multiple values for the same attribute on a single record without having to do Joines or anything. That's what set types are all about. So let's take a look at some of these types B and B. What a terrible name. Interestingly you can store binary data in Dynamo D-B But it's important. Understand though that the data is stored as a base 64 encoded string. This means that you'll use more storage than the binary version of the data because base 64 encoding isn't as efficient. So while it's cool that you can store binary data in Dynamo D-B don't do it because you think you're gonna save space. In fact it's quite the opposite. Boole you can install individual boolean values. Now storing a boolean set doesn't make a lot of sense. You may as well just use a list of booleans if you need multiple boolean values. Which brings us to L and M which lets you store lists and maps and attribute values in Dynamo DB. Now maps are basically like mini records without a key. And lists are of course just lists of values each value can have whatever type you want so you can mix and match different types within a list. That's totally OK. Then we have the number type. Of course you'll need to send these in the API as strings because we're dealing with things in a Jaison interface. But they are treated as numbers within dynamo DB. Especially when used as a sort key. Finally we have basic strings and sets of strings for dealing with textual data for getting items back there's a few ways to do it out of Dynamo D-B one is the get item interface and that's the simplest way to get an item. Just provide the key to it if you know it. So if you know the full key this is definitely the simplest and fastest way to get the item back. The result will be a single item or if that item doesn't exist you'll get back in error. Then we have a query. So if you know part of the key you can try to do a query queries always require a hash key. So if you don't have a sort key there's no reason to do a query because you can just do a get if you already have the hash key. You can also provide a constraint on the sort key so you could do something like retrieving all of the items were the sort keys above a particular value which can be handy and you can also provide a filter which dynamo D-B will provide after it satisfies the key constraints. And before the results are returned and if you really don't know the key you can just do a scan and it's best to equate this to a relational databases full table scan it will look at every single record and it will use all the read capacity that goes along with that. And as the records are pulled the filter you provide will be applied to determine what gets returned. So kind of a brute force approach there you definitely want to avoid scans whenever you can if you care at all about performance and scalability. But it's there when you need it. Let's take a minute to talk about pagination. So in a regular database you might select a fetch size and her database connection and if you want to do UI pagination you would use limits and offsets to configure a page of results. Dynamo DBI doesn't do that. What dynamo DB does is load up to one megabyte of data from the data store when satisfying your query or scan requests. And this is pre-filter data. So in the case of equerry it's based on the key but a scan will go through your entire table one megabyte at a time. And when you get back a result set you may get a description of the last key processed and this can be used to set an exclusive start key on the next request to get the next chunk of results. Now the start key is Xclusive because it is not included in the result. That's because it was included in the previous results. And since filters are applied after loading the data it's completely possible that you get no results in your resultset and get a Laskey process. This is because your filter may eliminate all the records that were loaded. Now when you create a dynamo db table you set the capacity you want to set different read and write capacity so you can tune your table to your own use case. Let's talk about real units first one read unit will allow you a consistent read of an item up to four kilobytes and size per second. If you do eventually consistent reads you can do twice that Dynamo DB gives you the option of how much consistency you want but you have to pay more to do immediately consistent reads. And as far as right units go rights are a little more expensive. A single right unit allows you to write one item up to one kilobyte per second. So with this understanding of capacity units you can start to see why avoiding scans is important. You don't want to burn all of your Read capacity doing a single scan that returns a small number of records. This is why table design is so important. And that's the basics of Dynamo D-B it's a lot but it will become second nature once you start using it which will do soon. So you probably have a lot of questions at this point but bear with us because once you start to get your hands dirty with Dynamo D.B I think it's going to make a lot more sense so let's move on.