Apache Kafka integration

From Fusion Registry Wiki
Revision as of 02:33, 7 February 2020 by Glenn (talk | contribs) (Created page with "=Version= Fusion Registry 10.0.2 =Overview= Fusion Registry can act as an Apache Kafka Producer where specified events are published on definable topics. The library of possi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Version

Fusion Registry 10.0.2

Overview

Fusion Registry can act as an Apache Kafka Producer where specified events are published on definable topics.

The library of possible events is extensible with the expection that new events can be easily added as required when use cases emerge. At present, the Kafka publication system recognises only one event, namely 'Structure Notification' which publishes changes to structures in a similar way to that of the RSS feed and the email notification subscription process.

Other events are envisaged include:

  • Anything that's audited including data registrations, user login events etc
  • Errors
  • Configuration changes, e.g. changes to server settings
  • Changes to Content Security Rules

Connection to a single Kafka service is supported.

Configuration

Configuration is performed through the GUI with 'admin' privileges.

Connection

KafkaConnection.PNG

Paramater Value
Client ID A unique identifier for the Fusion Registry client. There's no real restrictions.
Host Hostname or IP address of the Kafka Service as defined by the Kafka administrator
Port Port number of the Kafka service
Comression Algorithm The algorithm to compress the payload. Choices are: None, GZIP, Snappy, LZ4.
Enable Kerberos Security If Enabled,the Producer attempts to authenticate with the specified Kerberos service for access to Kafka

Topics

KafkaTopics.PNG

Defines the topic(s) for the event type.

Kafka Topic is ID of the topic on which to publish. The Spring Kafka interface on which the functionality is based indicates that multiple topics can be specified separated by commas.

So 'FOO' publishes on just the single topic specified.
'FOO,BAR' publishes on both FOO and BAR topics.

Note that publication on multiple topics has not been comprehensively tested.

General Behaviour

The Producer sub-system will attempt to publish events to Kafka in real time. If the publication fails, the message is placed on an in-memory queue which an independent thread periodically attempts to process. The thread is resilient and is replaced on failure to ensure queue is never left un-monitored.

IMPORTANT: The in-memory queue is not persisted so changes to Registry structures that are yet to be sucessfully published to Kafka will be lost if the Registry service is stopped.

Structure Message Publication Behaviour

Kafka Message Key

The SDMX Structure URN is used as the Kafka Message Key. Consumers must therefore be prepared to receive and interpret the key as the Structure URN correctly when processing the message. This is particularly important for structure deletion where the Message Key is the only source of information about which structure is being referred to.

Additions and Modifications

Addition and Modification of structures both result in an SDMX 'replace' message containing the full content of the structures. Deltas (such as the addition of a Code to a Codelist) are not supported.

Deletions

Deletion of structures results in a 'tombstone' message, i.e. one with null payload but the URN of the deleted structure in the Message Key.

Kafka Transactions

The addition and / or modification of a number of structures in a single Registry process are encapsulated into a Kafka Transaction, as when processing an SDMX-ML message containing more than one structure.
Structure deletions are never mixed with additions / modifications in the same transaction. This reflects the behaviour of the SDMX REST API whereby additions / modifications use HTTP POST, while deletions require HTTP DELETE.

Full Content Publication

Under certain conditions, the Fusion Registry Producer will publish all of its structures in a single SDMX Structure Message of the chosen format to Kafka.
These are:

  1. When a new Kafka connection is set up, or the configuration of an existing Kafka connection is changed
  2. Registry startup

The principle here is to ensure that Consumers have a consistent baseline against which to process subsequent change notifications.

It also mitigates the risk that changes held in the Registry's in-memory queue of messages awaiting Kafka publication were lost on shutdown by forcing Consumers to re-synchronise on Registry startup.

Publication Arbitration on Fusion Registry Load Balanced Clusters

IMPORTANT: There is currently no arbitration mechanism between peer Fusion Registry instances configured in a Load Balanced cluster. This means that each member of the cluster independently publishes each structure change notification to Kafka, resulting in duplication of notifications.

Structure changes made on one instance of Fusion Registry in a cluster are replicated to all of the others through either a shared database polling mechanism, or by Rabbit MQ messaging. An instance receiving information that structures have changed elsewhere in the cluster updates its in-memory SDMX information model accordingly. The update event is however trapped by its Kafka notification processor and published as normal.

Changes are required to the Kafka event processing behaviour of Registries operating in a cluster to arbitrate over which instance publishes the notification on Kafka such that a structure change is notified once and only once, as expected.