Dealing with Concurrency

Sundog Education by Frank Kane
A free video tutorial from Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
4.5 instructor rating • 22 courses • 453,894 students

Lecture description

What happens if two different clients try to update a document at the same time? We'll practice a way to avoid these possible sources of contention.

Learn more from the full course

Elasticsearch 6 and Elastic Stack - In Depth and Hands On!

Search, analyze, and visualize big data on a cluster with Elasticsearch, Logstash, Beats, Kibana, and more.

08:03:25 of on-demand video • Updated May 2019

  • Install and configure Elasticsearch 6 on a cluster
  • Create search indices and mappings
  • Search full-text and structured data in several different ways
  • Import data into Elasticsearch using several different techniques
  • Integrate Elasticsearch with other systems, such as Spark, Kafka, relational databases, S3, and more
  • Aggregate structured data using buckets and metrics
  • Use Logstash and the "ELK stack" to import streaming log data into Elasticsearch
  • Use Filebeats and the Elastic Stack to import streaming data at scale
  • Analyze and visualize data in Elasticsearch using Kibana
  • Manage operations on production Elasticsearch clusters
  • Use cloud-based solutions including Amazon's Elasticsearch Service and Elastic Cloud
English So when you're dealing with distributed system sometimes you can run into really weird problems with concurrency. What happens when two different clients are trying to do the same thing at the same time. Who wins. These are concurrency issues and let's talk about how to deal with that with the elasticsearch. So here's the problem. Let's say that we have two different clients say they're running a web application on a big distributed web site for example and they're maintaining page counts for given documents that are viewed. And we'll call those documents pages on our Web site. So let's imagine that two people are viewing the same page at the same time on our Web site through two different web servers. So basically at this point I have two different clients for elasticsearch retrieving a view count for a page from elasticsearch and since are both asking at the same time at this point it is trying to get the current count for that page. Let's say it comes back with the number 10. 10 people over to this page so far. Now both of these clients want to increment that at the same time so they're going to go ahead and increment the view count for that page and figure out that I want to write a new update for this document that has a view count of 11. And they will both in turn write a new document update with a new view count of 11. But it should have been 12. So you see there was this brief window of time between retrieving the current view count for the page and writing the new view count of the page during which things went wrong due to this concurrency issue. And this is a very real problem if you have a lot of people hitting your web site or hitting your elastic search service or cluster at the same time. This sort of weirdness can happen. So what do we do about it. Well there's a solution called optimistic concurrency control so let's walk through how this would work. It makes use of that version field that we talked about when we talked about updates. This is where it comes into handy. So in the same situation here we have two different clients that are trying to retrieve the current view count for a given page document from elasticsearch and they both get the number 10 back but remember when you request something from a elasticsearch it also gives you back a version number for that document. So I know now that the view count of 10 is associated explicitly with a given version number of that document. Call that Version number nine just for the sake of argument. So now when these guys say I want to write a new value for that page count I can specify that I'm basing that on what I saw in version 9. So when you do an update you can specify the version that you're updating. So what would happen then if two people try to update the same version is only one of them would succeed. So let's say the first one actually successfully wrote the count of 11 given version number nine. The other one would try to say OK I want to update this document explicitly for version number nine and the elasticsearch could then tell hey my current version is actually 10 not 9. Something's wrong here. You're basing this update on the wrong information. And at that point you can try again on that particular client. So I would just go back and try to reacquire the current view count for that page. Start over basically and then get back version 10 of that document which contains 11 and I could then increment that to 12 and write it again hopefully successfully. Now you don't have to do this necessarily by hand. There is another parameter called retry on conflicts when you do an update that will allow you to automatically retry if this happens. So that's kind of a nifty feature. So that's optimistic the concurrency control that you might want to stare at this slide for a little bit if it doesn't make sense. You know again the idea is that if you have many web servers or many clients that are trying to talk to a elasticsearch at once and try to update the same document at the same time you can use the version numbers in order to ensure that you're not stomping on each other. OK. And retry on conflicts and using an explicit version number updates are ways to work around this issue. So let's just go and see how it works in practice. So let's practice using optimistic concurrency control in action. So I've already spun up my virtual machine for elasticsearch and logged into it through ssh using putty on my Windows system. So let's go ahead and retrieve the current document for interstellar and see what the current version number is so you know imagine that we're trying to update interstellar to do that. I would do something like curl dash X get 127.0.0.1:9200 slasher movies index movie type and the document ID is 109487 and we'll pretty print the results. OK so you can see that the current version number is three. I'm working with. So what I can do is specify the version number that I'm modifying when I'm doing an update so let's do an update. Let's see say curl dash X put 127.0.0.1:9200 slash movies slash movie slash 109487. And watch this question mark version equals three. So that's telling me that I am explicitly updating version number three. And if that's not the current version that you have is the latest one. Give me an error back. OK. So let's actually give it some data to insert here to update it. Open curly brackets. genres IMAX , sci-fi It's like my title and we'll just update the title here to be something different. Inter-stellar Foo year release year is still 2014 and All right. So you can see that succeeded and I have a new version number for that results as a result of that. Now let's say that I'm another client who was trying to do that same update at the same time. Let's try and do that same command again where I'm saying OK I want to update version number three explicitly. So imagine this is coming from some other server somewhere or some other client who didn't get the memo that somebody else already updated this to version. And you can see that actually got a conflict error there. Version conflict engine exception, reasoned version conflict current version 4 is different than the one provided three. So you can see there that I got an error just like I should because the elasticsearch has detected that I'm baking a update request based on outdated information. So this is how optimistic concurrency control works by just taking note of the version number that you get back when retrieving a document and then specifying that version number when you attempt to update that document. So it's a very reliable way around this problem. Let me show you how retry on conflict's works to handle this automatically. For example I could say curl dash X postals to a partial update this time 1127.0.0.1:9200 movies movie 109487 underscore update and it's telling us a question mark. Retry on conflicts. Conflict equals five Dasht d and this time will just automatically update that title and nothing else change that back to what it was. Like this all right. So what's going on here is I'm doing an update query so under the hood that's going to automatically retrieve the document that currently exists the current version and it's going to change it and submit a new one. So this is basically automatically doing what we did before. It's getting the current version and then it's going to change it. In this case it's only going to change the title and then try to put in a new copy of that version under the under the next version number but it will automatically because I'm saying retry on conflict and implement optimistic concurrency control. And if in fact there is a conflict it will just keep retry and you will go back and get the current version again and just try again until that it does actually get a consistent response. So if we do have a concurrency issue this retry on conflict syntax with a partial update will just automatically do the right thing. So it's very handy. Let's see if it actually works. Sure enough it did so it is successful. It's successfully updated it. Let's just make sure that that's what's actually stored at this point. curl dash X gets 127.0.0.1:9200 hundred slash movies slash movie slash a 109487 pretty. And sure enough our titles back to what it was interstellar and everything looks great. So that's optimistic concurrency control in action. You can either do it yourself by getting a document and taking note of the version and incrementing it with your update or you can use partial updates that does it all for you. With the retry on conflict parameters set and the value of retry and conflict is how many times it will retry before finally giving up. So you know if I were to actually do five retries in that previous example and it still had a problem well I might want to raise some sort of an issue to me or the user saying something really weird going on here and that maybe there's some sort of attack going on on my Web site where everyone's trying to hit one document at the same time or something. Who knows but that's it in action. Optimistic concurrency control you can see it's not really that hard and it's a very useful tool for managing weird concurrency issues on your elasticsearch cluster.