Introduction to Event Sourcing

Daniel Ciocîrlan
A free video tutorial from Daniel Ciocîrlan
Software Engineer & Best-Selling Instructor
4.7 instructor rating • 11 courses • 67,237 students

Learn more from the full course

Akka Persistence with Scala | Rock the JVM

A must-have for Akka developers: write long-term reactive systems with Akka Persistence and PostgreSQL or Cassandra!

07:05:46 of on-demand video • Updated March 2021

  • Learn advanced Akka with Persistent Actors
  • Write long-lived, fault-tolerant distributed systems
  • Use Akka Persistence in production with PostgreSQL or Cassandra
  • Adopt a new mental model with Event Sourcing
English [Auto] All right, welcome back. It's about time we dived into the meat of this course. I'm Daniel, and in this lecture we're going to have a brief overview of the fundamental assumptions of persistence. And we're going to be talking about event sourcing. So I assume that from your experience writing back in the applications, you've surely had the need to interact with some form of long term storage databases or some other kind of storage like files or some other cloud based stores. Now, in the context of Orka, we're interested in the scenario of actors interacting with a persistent store. The traditional school of thought is to make the database a reflection of the current state of things, which is a bit problematic for a few reasons. Some questions that the traditional databases have a hard time answering are, for example, how do you query a previous state in the past? For example, if you're asking a bank for a statement two months ago or how to trace the progress of something that arrived in the current state. Some examples include online stores with tracking orders worth banks by tracking transaction histories or chatrooms like Slack or document versioning in the style of Dropbox. Even we as programmers often used versioning systems which never store the code in its current form, but rather the sequence of changes that led us to where we are. This is the intuition that we are going to develop in this lecture. So consider the example of an online store. If you ever want to ask for all the data of an order and you get this, which is what the traditional relational thinking would give you, you probably end up scratching your head as to why this particular order ended up being refunded. But if you think about all the events that influenced this order, you'll get a much richer description. So everything from creating an order to allocating inventory, despatching, delivering, returning counseling and finally refunding. So in this course for persistent data, we're going to store events instead of always storing the current state, we can always recreate the current state if we wanted by replaying those events. This new model is called event sourcing, and it will be central to the persistence. Of course, I'm aware that this is a new mental model. So let's discuss the pros and cons. Let's start with the good stuff first. So since all the events are appended, we can have huge performance gains during writing because we can implement highly efficient append only stores. Secondly, the event source model avoids relational databases and object relational mapping completely. So if you've been in this business long enough, you've probably found that Aurum is a particular kind of pain as your object oriented model might become extremely fragmented or your database might become too normalized. Third, event sourcing can answer both of the hard questions that we discussed earlier, inferring an earlier state and the full audit log about how we got to the current state. And finally, events sourcing fits the actor model perfectly because in the orchid world, everything is a message so we can treat the events as such. Of course, like any model, the events source model is not perfect, so it has its drawbacks. First of all, querying a state is potentially expensive because we always have to replace all the events. We will sort this out later in the course when we discuss aka persistance query. Secondly, we can have potential PERF issues with long-lived entities because the stream of events might be extremely large. This is again a solvable problem. In this very chapter we will discuss snapshots which was invented for this exact issue. Then we have the problem of the data model changing with time as your application evolves. This is actually a big pain and we will treat this issue later in the course with some special schema evolution techniques that Orka has come prepared with. All right. So all of the above are solvable problems. Now, finally there a mental gap of a new mental model which most seasoned programmers will be probably resistant to. But frankly, if you've taken this course, you're already eager to learn, and I'm here to make it as smooth as possible for you. So this shouldn't really be an issue. All right. So having discussed events sourcing as the key principle of persistence, let's talk about it actually achieves it with persistent actors so actors can do everything that normal actors can do, that is send and receive messages, hold and manage internal state and of course, run in massive numbers in parallel. Now, persistent actors also have some extra capabilities. First of all, they have what is called a persistance I.D., which identifies this persistent actor in relationship to the person in store for best practice. This should be unique every actor. Secondly, persistent actors can send events to a person store, which is an action that will simply call persist. All right, third. A persistent actor is able to recover its state by replaying all the events associated to its persons inside. So when a persistent actor handles a message, which we will call a command for the purpose of this course, aside from its normal capabilities, it can asynchronously persist the event to its persistent store. And after the event is persisted, it can change its internal state. Now, when a persistent actor starts up or is restarted due to some supervisor strategy, it replays all the events associated to its persistent side in the same order that were first sent so that the actor can recover its state as if nothing had happened. Let me show you with a diagram. So see, this Blue Dot is an actor which you might remember from the essential Schaus. The diamond in the middle is the actors receive handler and the list at the bottom is the actor's mailbox. So assume this geometrical abstraction from now. Let's also assume that we have a persistent store which for the purpose of this course, we will call a journal. OK, so when a command is decoded from the mailbox and handled by the actor, the actor can persist an event to the journal. This happens asynchronously, and after the Journal has finished, the actor can change its state and or send messages to other actors. After that, the command is simply discarded. This happens a bunch of times. A command is handled, an event is persisted, and then the actor might or might not send messages to other actors and then the command is discarded. Let's now think that we have a journal and a persistent actor is just starting up or has been restarted as a result of some supervisor strategy. Now, before handling any commands whatsoever, the actor will automatically query the journal for all events associated to its purpose inside. If the Journal has events for it, they will be replayed to the actor in the same order. They were persistent and the actor will change its state as a result. After that, the actor is free to receive further messages. If an actor receives commands during the recovery phase, there are simply stashed until the recovery has completed, which I'm going to show you in the code a few lectures from now. So this is how events, sourcing and persistent actors work, in a nutshell, this lecture was intentionally light on theory because I want to show you everything in actual code starting in the next video.