Announcing Zeebe Node Client 0.23
The 0.23.2 version of the Zeebe Node Client is out and available via NPM.
There was a critical build error in 0.23.0 and 0.23.1. Special thanks to @myfjdthink for reporting it, and @lwille for the Pull Request that fixed it.
The complete Change Log is at the end of this post, with the full list of known issues (for the first time in a release - there is one), breaking changes, new features, and fixes.
In this post, we’ll also feature a few of the major changes in this release in detail.
- Known Issue: Connection Error debouncing
- Fix: Transparent Reconnection after K8s pod reschedule
- Feature: Developer-friendly logging
- Feature: Batch Worker
- Version 0.23.0 Changelog
This release is a major refactor, replacing the previous underlying gRPC implementation (based on the C client) with a pure JS implementation of gRPC.
This means that the build chain for projects is simplified. Previously, you needed to rebuild your Node.js Zeebe projects for the target Operating System. This meant that a developer could not
npm install on a Mac or Windows workstation and simply mount the development directory in a Linux docker container.
Now, with the pure JS client, the same
npm install works across operating systems.
As well, prior to the 0.23.0 release, Docker files required either a multi-step build or a Docker container with the build chain in it, and Electron projects required the use of electron-rebuild.
So build complexity, build times, and eventual project size have all been reduced.
Representatively: the overhead of a Zeebe Node project has been reduced from 50MB to 27MB.
Known Issue: Connection Error debouncing
The pure JS gRPC client has some different behavioural characteristics to the C client, and this results in the first Known Issue in a zeebe-node release.
Because the channel behaviour differs, the internal error and retry handling has been modified. The unit tests all pass, so as far as we can tell the behaviour is the same - with one exception that we know about: the
onConnectionError handler is not correctly debounced in this release. This means that for a single error event, your error handler or
connectionError event listener may be invoked multiple times - so if you are using this feature, you need to make sure it is the lack of debouncing does not impact your application. This will be addressed in a future release.
There may be other behavioural characteristics that are not covered in the unit and integration tests, so if you discover something, please open a GitHub issue, and we’ll take a look at it.
Fix: Transparent Reconnection after K8s pod reschedule
As part of the work implementing the pure JS version, we created a test using a custom version of testcontainers-node to test the behaviour of the Node client when a broker is stopped, then restarted - simulating a Kubernetes pod reschedule.
Prior to 0.23.0, a broker restart required a restart of any connected workers, as the gRPC channel would fail to reconnect. With the 0.23.0 release, the client now transparently reconnects. Users of Camunda Cloud (which offers a hosted Zeebe service), where pods are periodically rescheduled, will be happy about this.
Again, if you encounter issues with this resiliency, please open a GitHub issue.
Feature: Developer-friendly logging
Prior to 0.23.0, the Zeebe Node client used a structured JSON log output by default. This is useful if you are sending your docker logs to Logstash or a similar log collector, but is not very friendly for development on a workstation.
In the 0.23.0 release, the default logger has been changed to a simple text logger that provides a more human-friendly output.
You can turn on JSON structured logging via an environment variable for production systems.
Feature: Batch Worker
Version 0.23.0 introduces a new worker type - the
ZBBatchWorker. This worker allows you to batch jobs in the worker, then process them all at once when a predefined capacity or timeout is reached.
ZBBatchWorker batches jobs before calling the job handler. Its fundamental differences from the ZBWorker are:
- Its job handler receives an array of one or more jobs.
- The jobs have
forwardedmethods attached to them.
- The handler is not invoked immediately, but rather when enough jobs are batched, or a job in the batch is at risk of being timed out by the Zeebe broker.
You can use the batch worker if you have tasks that benefit from processing together, but are not related in the BPMN model.
An example would be a high volume of jobs that require calls to an external system, where you have to pay per call to that system. In that case, you may want to batch up jobs, make one call to the external system, then update all the jobs and send them on their way.
Version 0.23.0 Changelog
Things that don’t work or don’t work as expected, and which will be addressed in a future release
onConnectionErrorevent of the ZBClient and ZBWorker/ZBBatchWorker is not debounced, and may be called multiple times in succession when the channel jitters, or the broker is not available. See #161.
Changes in APIs or behaviour that may affect existing applications that use zeebe-node.
job.customHeadersin the worker job handler are now typed as read-only structures. This will only be a breaking change if your code relies on mutating these data structures. See the section “Working with Workflow Variables and Custom Headers” in the README for an explanation on doing deep key updates on the job variables.
- The ZBClient no longer eagerly connects to the broker by default. Previously, it did this by issuing a topology command in the constructor. This allows you an onReady event to be emitted. You can re-enable the eager connection behavior, by either passing
eagerConnection: trueto the client constructor options, or setting the environment variable
true. See #151.
- The library nows logs with the simplified
ZBSimpleLoggerby default, for friendly human-readable logs. This will only be a breaking change if you currently rely on the structured log output. To get the previous structured log behaviour, pass in
stdout: ZBJsonLoggerto the
ZBClientconstructor options, or set the environment variable
JSON. Refer to the “Logging” section in the README.
New shiny stuff.
- The underlying gRPC implementation has been switched to the pure JS @grpc/grpc-js. This means no more dependency on node-gyp or binary rebuilds for Docker containers / Electron; and a slim-down in the installed package size from 50MB to 27MB.
- Timeouts can now be expressed with units using the typed-duration package, which is included in and re-exported by the library. See the README section “A note on representing timeout durations”.
- There is a new
ZBBatchWorker. This allows you to batch jobs that are unrelated in a BPMN model, but are related with respect to some (for example: rate-limited) external system. See the README for details. Thanks to Jimmy Beaudoin (@jbeaudoin11) for the suggestion, and helping with the design. Ref: #134.
ZBClient.createWorkerhas two new, additional, method signature. The first is a single object parameter signature. This is the preferred signature if you are passing in configuration options. The second signature is a version of the original that elides the
idfor the worker. With this, you can create a worker with just a task type and a job handler. A UUID is assigned as the worker id. This is the equivalent of passing in
nullas the first parameter to the original signature. The previous method signature still works, allowing you to specify an id if you want. See this article for details.
- There is now a
ZBLogMessageinterface to help you implement a custom logger #127. For an example of a custom logger, see the Zeebe GitHub Action implementation.
- There is new custom logger implementation
ZBSimpleLoggerthat produces flat string output. If you are not interested in structured logs for analysis, this log is easier for humans to read.
ZBClientnow contains an
activateJobsmethod. This effectively exposes the entire Zeebe GRPC API, and allows you to write applications in the completely unmanaged style of the Java and Go libraries, if you have some radically different idea about application patterns.
- The Grpc layer has been refactored to implement the idea of “connection characteristics”. When connecting to Camunda Cloud, which uses TLS and OAuth, the library would emit errors every time. The refactor allows these connection errors to be correctly interpreted as expected behaviour of the “connection characteristics”. You can also set an explicit initial connection tolerance in milliseconds for any broker connection with the environment variable
ZEEBE_INITIAL_CONNECTION_TOLERANCE. See this article, issue #133, and the README section “Initial Connection Tolerance” for more details.
- The connection tolerance for transient drop-outs before reporting a connection error is now configurable via the environment variable
ZEEBE_CONNECTION_TOLERANCE, as well as the previous constructor argument
- The integration tests have been refactored to allow them to run against Camunda Cloud. This required dealing with a Zeebe broker in an unknown state, so all tests now template unique process ids, unique task types, and unique message names to avoid previous test run state in the cluster interfering with subsequent test runs.
- I’ve started documenting the internal operation of the client in BPMN diagrams. These can be found in the
- The README now contains a section “Writing Strongly-typed Job Workers”, on writing typed workers in TypeScript.
- The README also has a shiny TOC. It has grown in size such that one is needed.
- An unmaintained package in the dependency tree of kafka-node (and arguably a bug in NPM’s de-duping algorithm) caused zeebe-node to break by installing the wrong version of the
longdependency, unless the two packages were installed in a specific order. We’ve explicitly added
longto the dependencies of zeebe-node to address this, and reported it to kafka-node. Thanks to @need4eat for discovering this and helping to track down the cause. See #124.
- Prior to 0.23.0 of the zeebe-node client, a worker would not reconnect if the broker was restarted, throwing gRPC channel errors until they were restarted. A stalled retry timer has been added to the worker. The worker will now automatically reconnect when the broker is available, if it goes away and comes back. See #145, and #152.
- Prior to 0.23.0, a worker would periodically lose its connection to Camunda Cloud. This has been addressed with the stalled retry timer. See #99.