Announcing Zeebe 0.25.1 & Operate 0.25.0

by Zeebe & Operate Team on Nov 6 2020 in Releases.

A new release for Zeebe 0.25.1 and Operate 0.25.0 is available now. You can grab the releases via the usual channels:

As usual, if you’d like to get started immediately, you can find information about it directly on the Zeebe & Operate documentation website.

It is possible to perform a rolling upgrade to Zeebe 0.25 from the 0.24.4 release but not from previous 0.24 patch releases. For Camunda Cloud users, 0.25.1 will become the new production version in the next week, your existing clusters will be upgraded automatically.

We’d like to add a special shout out to @aivinog1 for all his contributions in the last few months. Thank you Alexey!

Here are some highlights:

In the rest of this post, we’ll go into more details about the changes that the latest stable releases bring.

Zeebe 0.25.1

As seen from the highlights, this release focused both on continuing to improve stability as well as providing ways for users to configure Zeebe to get the best performance. Most bug fixes have already been released over the last quarter as patch releases for 0.23.x and 0.24.x.

Broker improvements

Allow disabling explicit flushing in Raft

By default, Raft followers will flush written entries to disk before acknowledging their writes to the leader. This is required to ensure consistency, otherwise a follower could lose an acknowledged entry which would invalidate the quorum. Even though most users should not disable this, as it can cause logs to be inconsistent in replicated clusters, it it’s now possible to do so. This should only be used by advanced users who wish to exchange fault tolerance for a performance gain.

Expose API to trigger snapshot

Taking snapshots allows Zeebe to truncate its log and reduce its disk usage. Normally, it happens automatically after a configurable period. This release introduces an API to trigger snapshots without waiting for the snapshot interval. This can be helpful for testing or to reduce the amount of records that have to be reprocessed when upgrading.

Detect reprocessing issues when upgrading

It may happen that reprocessing fails after upgrading to a new version due to changes in the workflow engine’s logic. To mitigate the impact of this, the broker will now inform the user if any such issues will occur when upgrading and what can be done to solve them.

Configure RocksDB column family options

It’s now possible to configure the RocksDB column family options in the Zeebe configurations. These options may be useful to tune RocksDB’s performance for specific use cases. An example of these configurations can be found in the template for the broker configuration in the distribution’s config directory.

Support ‘now’ for date and time constructor

It’s now possible to use now() in FEEL expressions in timer events, which evaluates to the current date-time.

Raft’s serialization format is backwards compatible

Making improvements and bug fixes in the Raft implementation often requires changes to data that is either persisted or sent through the network. This posed an issue, as it implied breaking backwards compatibility. This release fixes this issue by introducing a way to make backwards compatible changes. Enabling this feature required some preparation in order to maintain backwards compatibility from previous releases which is why the rolling upgrade is only possible from 0.24.4, as that is the only release which supports both the old and new format.

Support memory mapped log stream segments

By default, Zeebe uses file channels to read its log segments. Although previous releases had already introduced an optional optimization which makes use of memory mapped log segments, it was unsafe to use it in replicated clusters. This release makes it safe to do so, although it should still be considered experimental as it’s not fully mature. This setting can be enabled under zeebe.broker.data.useMmap.

Raft duplicate leader bug fixes

We’ve also fixed two serious issues in our Raft implementation. The first would cause a recently deposed leader to commit his uncommitted entries without acquiring a quorum when receiving new entries from the new leader. The second would cause a node to vote for two candidates in the same term, which means two leaders could be elected for the same term. Both of these bugs could result in logs diverging and becoming corrupted. It’s interesting to note that both have been discovered by our new randomized tests for the Raft implementation. In every CI build, these tests generate new random operation sequences which, over time, allow us to test many different executions that would be impossible to test manually.

Shorter restart times

Restarting a node often took a long time due to the need for a snapshot to be fully replicated and installed. Optimizing this process has brought the restart time down by several orders of magnitude. There are also new Grafana metrics to help monitor restart performance.

Exporting bug fix

Previously, exporters were not updating the exported position if records were filtered. Fixing this bug prevents Zeebe’s log from growing without being compacted in a low load scenario where only filtered records are written.

Improved workflow validation

Zeebe’s workflow validation has been improved. These improvements include things like preventing workflows from having empty error events and invalid timer cycles.

Support minimum free disk space

It’s possible to configure parameters to ensure that a minimum amount of free disk space. To prevent infringing this limit, a Zeebe broker will step down.

Client improvements

Specify resource name when deploying with zbctl

You can now specify a custom resource name when deploying workflows with zbctl. This is helpful to prevent deploying duplicate workflows when using different clients since, for duplicate workflows to be filtered, they must have the same resource name.

Get message key as response from publish command

When publishing a message, the response to the command will contain an identifying key for the published message.

Clients send client type and version in authentication requests

Both the Java and Go clients now add information about their type and version in the user-agent of the auth requests. This information is useful when investigating issues that stem from the interaction between Zeebe and specific clients.

JobWorker interface can be used with try-with-resources

The JobWorker in the Java client interface now implements the AutoCloseable interface which makes it possible to use the JobWorker with a try-with-resources block.

Operate 0.25.0

LDAP Authentication

With Operate 0.25 we added support to connect Operate with your own LDAP to allow authentication of users. You can read more about how to configure the LDAP connection in our documentation.

Connect To Secured Elasticsearch

As often requested by users, and to align with the capability of Zeebe, Operate is now able to be configured to connect to a secured elasticsearch instance. See the configuration section of our documentation.

Default Elasticsearch Indices Configuration

It is now possible to set the default number of shards and replicas of the Operate Elasticsearch indices in the configuration of Operate, see the documentation for more information.

Improved Migration/Archiving of Large Workflow Instances

In Zeebe a workflow instance can have a large amount activites, for example if a loop or multi-instance sub-process is involved. Therefore, migrating or archiving one of these instances can take a noticeable amount of time. In previous versions of Operate this could lead to requests timing out between Operate and Elasticsearch. With the latest version of Operate we are now using the Elasticsearch Task API to better handle such long-running data modifications.

Liveness/Readiness Check

With the new version, we adjust the exposed liveness and readiness checks to be aligned with the common best practices in Spring Boot, see the documentation to see what changed.

Note about Zeebe 0.25.0

As you might have noticed in this announcement we are refering to Zeebe 0.25.1, and not 0.25.0. The Zeebe 0.25.0 release contains a feature which was intented to detect anomalies during upgrading a Zeebe cluster. Before announcing the release to the public, we discovered an issue with this feature which could lead to a degraded user experience during normal runs. This feature was originally built to help us ensure that upgrades were safe, and as such, should not impact the normal usage of Zeebe . To compromise, we’ve added a feature flag in 0.25.1 which lets you turn the detection on and off - by default it is off, and we would recommend users to turn it on during upgrades. This can be done via an experimental configuration flag zeebe.broker.experimental.detectReprocessingInconsistency = true (or an environment variable ZEEBE_BROKER_EXPERIMENTAL_DETECTREPROCESSINGINCONSISTENCY="true").

You can read more about the recommended upgrade procedure in our documentation.

Get In Touch

There are a number of ways to get in touch with the Zeebe community to ask questions and give us feedback.

We hope to hear from you!