Chapter 8. Implementing Persistence and Transactions

In real-life scenarios, we need to have process instances running for many hours, days, or even years. If we want to be able to run processes for such a long time, we cannot rely on something as volatile as the memory of a server to keep track of all our process instances statuses. We need a way to provide persistence for them.

This chapter focuses on providing a persistence and transaction mechanism to our process engine environment, which will allow us to store enough information about our executions to be able to recreate them afterwards. This feature brings the possibility of having more than one thread or server trying to access the same persisted runtime environment at the same time. So, we will also learn how to generate a transaction around said persistence to make sure no concurrency issues occur.

Persistence and transactions are topics that are hard to handle for newcomers, mostly because they entail a lot of different configuration points that need to be carefully orchestrated to get the expected behavior from our application. Once you have finished this chapter, you will learn about the following:

  • How the jBPM6 persistence works
  • How to configure all the different components of the persistence and transaction
  • Why we need transactions in our systems

Why do we need persistence and transactions?

So far, we've dealt with very short-lived processes. Running inside a JUnit test, process instances last very few milliseconds, and having them in memory is sufficient for those environments. However, in real-life situations, we don't usually create interactions between systems and humans without having long wait states in between. This is especially relevant for the case of processes with human tasks, where users backed up with too much work might have a pending task assigned to them for hours, days, or even more. For each of the processes in your environment, you should be able to determine whether it should be a persistent or a nonpersistent process depending on how long-lived each of its process instances will be. From this perspective, you will find these two top-level categories:

  • In-memory processes
  • Long running processes (also known as persistent processes)

In-memory processes are usually short-lived processes that perform entirely automatic synchronous interactions. Depending on the complexity of each automatic interaction, some processes might take longer to complete than others, but with this type of process, in general, a range that might go from a few seconds to a few minutes is more likely.

Long-running processes, on the other hand, might take several minutes to complete the fastest process instance—with more real values ranging from hours to even years, depending on the nature of the process. Keeping an in-memory reference for such processes becomes impractical and even impossible to guarantee in some cases. For such processes, we need a persistence mechanism to keep a recoverable reference to the process whenever it is needed.

Several questions will help you determine whether your process will need to be persisted or not:

  • Does your process interact with other components? How slow are they? How prone to failure are they? If external interactions are really slow and asynchronous by nature, or prone to errors and retries, then a persistent process will be able to handle it better than an in-memory process.
  • Should the process be able to recover from a failure state? If we should manually recover from an error state, we will need to keep a reference of the process to recover it later on.
  • Does the process interact with humans? Human interactions are intrinsically slow from a computer perspective. Persistent processes should be used whenever human interactions are involved.
  • Does the process have a high demand? This might imply that we need to handle the process in a distributed environment. A persistent process will allow you to retrieve an existing process from any other system with connection to the database.

Persistence of the process' internal state needs a set of characteristics that we need to understand in order to take full advantage of the mechanism. We'll proceed to explain each component of the persistence to make it as clear as possible to configure it in the way that best suits your needs.

Persisting long running processes

The main reason we need to persist processes is because they are yet to be finished. From a process' perspective, we say that the process is still active, which means its business goal hasn't been accomplished yet. This means that the process is still running from its internal perspective; however, from a more technical view (and if we are using asynchronous work item handlers), the process is going to be waiting for some external interaction, either a human interaction or a system interaction. To understand these wait states, consider the following process flow:

Persisting long running processes

As you can see in the preceding process definition flow, the first thing a process instance will do is execute the Start event, and then the User task User Interaction 1 will be started. However, since it is an actual human interaction, it will be waiting for the user to complete said task before continuing with the execution. As we saw in Chapter 6, Human Interactions, this will leave the process instance in an active state, and wait for the user interaction to be finished to continue with the process execution. The following process flow shows how this interaction will look from a technical perspective:

Persisting long running processes

As you can see, there are four time lapses to which we should pay attention in the previous process definition section. The first one, t1, starts when we call the startProcess method on a KIE session object. Once it reaches the start of a User task, it will return control to the invoking thread, and wait for the User task to be finished. This could take a really long time, and we don't want to have to depend entirely on memory until a user decides to finish the task. This is one of the reasons persistence is needed for jBPM6.

Persistence is also needed for external system interactions, because of two main characteristics:

  • An external system interaction might take a lot of processing time. This could be because it's using a slow, overused system. The problem is we might have many different process invocations waiting for the next state in a single BPM system, and if we leave all waiting states on external interactions in memory, we might exceed the BPM system capacity.
  • The external system could fail and need a retry at some other time. This is mostly a consideration for asynchronous external interactions rather than for persistence. However, while the process instance is waiting to do a retry at a given time, we can release resources from the server if we store those process instances now and reload them later on.

Now that we have understood that we might need to store and reload our process instances at different time intervals, we have to deal with another particular problem: making sure two different threads don't try to access the same process instance at the same time, causing an internal conflict. The solution for said problem is also the solution to some other much more important issues, which we will discuss in the next subsection.

The server failover and distribution mechanism

Whenever we have generic systems that might cover very diverse cases of our organization, it becomes more and more important that we have a way to scale up such applications. In order to do so for a BPM system, process persistence is the key. This is because it will not only release precious memory resources when not needed, but also allow the creation of different nodes in a High Availability grid managing said processes if they have more idle time, all synchronized through the database.

However, this carries another issue that we should discuss: the possibility that two different servers (or even two different threads in the same server) may try to access and change the same process instance simultaneously. This can be solved in the same server with very simple in-memory synchronization mechanisms. However, when you have different servers competing for the same process instance, we need something more powerful, that is, transaction management.

Transaction management allows us to make sure that different servers will not work on the same process instance at the exact same time, and persisted process instances will not be persisted if a runtime error occurs (a possibility provided by the rollback capacity of transactions).

Now that we understood the necessity of persistence and transactions in a BPM system, it's time to analyze how these components are provided in jBPM6.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset