Walkthrough of common architectures

Bo Andersen
A free video tutorial from Bo Andersen
Lead Developer
4.5 instructor rating • 4 courses • 62,786 students

Lecture description

In this lecture, we take a look at some common use cases for the Elastic Stack, and how a simple architecture might evolve over time to incorporate more components of the Elastic Stack.

Learn more from the full course

Complete Guide to Elasticsearch

Learn Elasticsearch from scratch and begin learning the ELK stack (Elasticsearch, Logstash & Kibana) and Elastic Stack.

12:10:23 of on-demand video • Updated September 2020

  • How to build a powerful search engine with Elasticsearch
  • The theory of Elasticsearch and how it works under-the-hood
  • Write complex search queries
  • Be proficient with the concepts and terminology of Elasticsearch
English Now that you know what Elasticsearch can be used for and understand what the Elastic Stack is, let’s take a look at a couple of architectures and how Elasticsearch is commonly used. Let’s start out simple and go from there. Suppose that we have an ecommerce application running on a web server. The data is stored within a database, such as the product categories and the products themselves. So when a product page is requested, the web application looks up the product within the database, renders the page, and sends it back to the visitor’s browser. Now we want to improve the search functionality on the website because so far, it has just been using the database for this. But as you might know, that’s not what databases are really good at. So after looking around online, we decided to go with Elasticsearch, because that seems to be the best tool for the job. That’s an excellent choice, but how do we integrate it with our current architecture? The easiest way, is to communicate with Elasticsearch from our application. So when someone enters a search query on our website, a request is sent to the web application, which then sends a search query to Elasticsearch. This can be done with a plain HTTP request, but typically you will use one of the client libraries that are provided for all popular programming languages. When the application receives a response, it can process it and send the results back to the browser. But how do we get data into Elasticsearch in the first place, and how do we keep it updated? Whenever a new product is added or updated, we should add or update the product within Elasticsearch too, apart from within the database. So essentially we’ll duplicate some of our data. That might sound like a bad idea, but I will get back to why this is actually the best approach. Okay, that’s great if we are launching our website with Elasticsearch from the beginning, but what if we are adding it to an existing application that already has data, and potentially lots of it? How do we get that data into Elasticsearch then? Well, that takes a bit more work. You could write a script that does it all in one go if you don’t have that much data. If you have more, you would need to paginate - or scroll through - your data. There are some open source projects that can help with this, but not all of them are actively maintained, and usually it’s a fairly simple task to write a script that does this. Then from the moment the data has been imported, your application will keep it up to date whenever something changes. That’s probably the simplest usage of Elasticsearch that you will come across, so let’s expand a bit on that architecture. Our boss has told us that he wants a dashboard where he can see the number of orders per week, the revenue, etc. We could build our own interface, but we would save a lot of time by using Kibana, so that’s what we have decided to do. As you know, Kibana is just a web interface that interacts with Elasticsearch, so we don’t need to add any data to it or anything. We just need to run it somewhere, on a machine of our choice. To keep things simple, let’s just assume that we dedicate a machine - or a virtual machine - to run Kibana on. So all we need to do, is to run Kibana on a machine and configure it to connect to Elasticsearch. That’s it. As time passes, the business continues to grow, and we are really starting to see a lot of traffic on the website. Our server is starting to sweat a bit, and we want to make sure that we can handle future growth. A part of this is to handle spikes in traffic and knowing when it’s time to add another web server. To help us with this, we have decided to make use of Metricbeat to monitor system level metrics, such as CPU usage and memory usage. We can also instruct Metricbeat to monitor our web server process if we want to. So we start up Metricbeat on the web server, but how does the data end up in Elasticsearch? As you might remember from the previous lecture, Beats supports sending data to both Logstash and Elasticsearch. In terms of simplicity, we will do the latter. What we will actually be doing, is to have Metricbeat send data to a so-called Elasticsearch ingest node. The details of this is not important for now, so don’t worry about what an ingest node is for now. The point is that we can send data to Elasticsearch, and we can perform simple transformations on our data in case we need to. Now that system metrics are being stored within Elasticsearch, we can visualize the data within Kibana. What’s really cool, is that we can actually instruct Metricbeat to configure a dashboard within Kibana. That’s just a default dashboard that displays data that you are probably interested in. You can of course tweak it to your needs, so consider it a template. But how does Metricbeat actually do this when it doesn’t communicate directly with Kibana, you might wonder? Well, Kibana stores its configuration within Elasticsearch, so no matter where your Kibana instance runs - or if you start up a fresh instance - your settings will be applied automatically. A bit later in the course, you will see exactly where the configuration is stored. Okay, so far so good. We can now monitor how our server is performing, and we can set up alerts within Kibana if we want to be notified if the CPU or memory usage reaches some threshold. When that happens, we should be ready to add an additional web server. As the business continues to grow, so does the development team and the code base. Apart from monitoring system level information on our web server, we want to monitor two more things; the access and error logs from our web server. While we do make use of Google Analytics on the website, the access log can provide us with some useful information, such as how long it took the web server to process each request. This enables us to monitor the response times and aggregate the requests per endpoint. By doing this, we can easily identify if some bad code is deployed, causing the response time to spike for a specific endpoint. Not only can this cause a bad user experience, but it can also put additional stress on our system, so we want to detect if that happens. Since the business has grown, so has the number of features that need to be implemented. The development team is bigger now, and as a result we are able to push out more code to production. Of course that also increases the risk of introducing bugs, so we want to stay on top of that apart from the testing that we do before deploying new code. The web server’s error log can help us with that, and optionally any application log that we might have. Filebeat is a perfect tool for this job! All we need to do, is to start up Filebeat, and we will have our logs processed and stored within Elasticsearch, ready for analysis within Kibana. Pretty easy, right? Filebeat has built-in modules for this, which takes care of the heavy lifting in terms of parsing the logs, handling multi-line events such as stack traces, etc., so for simple use cases, we don’t need any configuration to get this to work. Okay, so let’s fast forward 6 months. In the meantime, we have added two more web servers to handle more traffic, so our architecture now looks like this. We are storing more kinds of data within Elasticsearch, including events. An example could be when a product is added to the cart, because we then want to do some analysis within Kibana. So far, we have been using Elasticsearch’s ingest nodes, where we could do very simple data transformation. But now we need to do more advanced processing of our events, such as data enrichment or whatever the case might be. We could do this within our web application, but this has a few disadvantages; first of all, our business logic code will be cluttered with event processing, and this might increase response time unless we do this asynchronously. Nevertheless, it leads to more code, which is not directly related to actually adding a product to the cart, for instance. Secondly, event processing will happen in multiple places. It’s not that big of a problem that it happens on more than one web server, because they should all be running the same version of the code. But we might have a backoffice server where administrative tasks are performed, such as managing products. Or perhaps we even want to migrate to a microservice architecture. Whatever the case might be, we can quickly end up in a situation where events are processed in many different places, making things harder to maintain. Wouldn’t it be better if we could centralize the event processing and have it all done in one place? We can accomplish that with Logstash, which also gives us more flexibility in terms of processing our events than Elasticsearch’s ingest nodes do. So let’s go ahead and add Logstash to the architecture. We’ll be sending events from our web servers to Logstash over HTTP. Logstash will then process the events however we instruct it to, and send them off to Elasticsearch. This way, we keep event processing out of our web application; all they need to do, is to send the initial data to Logstash, and there we will be able to do the rest of the processing. But what about the data from Metricbeat and Filebeat? As for Metricbeat, chances are that we don’t need to do any custom processing of the data, but that might be the case for Filebeat if we are using custom logging formats, or if we want to do some additional processing. If that’s the case - or if we just prefer to centralize things - we can definitely send the data to Logstash for processing. Otherwise it’s also okay to send data directly to Elasticsearch if we don’t mind that data is being added from multiple places. So what about the rest of the data, then? For instance when a new product is added or a product is modified or deleted. You could leave everything as is, and have your web application update the data in Elasticsearch directly. That’s the easiest approach for sure, but it is also more error prone, as code errors in the application might break event processing. On the short term, this would be acceptable while migrating everything to Logstash events, keeping everything centralized within Logstash pipelines. Ideally, our web application would only be querying Elasticsearch for data, not modifying any data, so in a perfect world, read-only access should be sufficient. Of course that might not always be the case in the real world, but it’s a good goal to have nevertheless. Alright, that was an example of a simple architecture evolving over time to become slightly more advanced. That’s a pretty typical example of how you will use Elasticsearch and the Elastic Stack in general, although there are of course variations and other use cases as well.