Announcing Data Prepper 2.5.0

Wed, Oct 11, 2023 · Taylor Gray, Hai Yan

Data Prepper 2.5.0 is now available for download. This release includes a new OpenSearch source, new dissect and translate processors, and additions to the existing key-value processor.

OpenSearch source

The OpenSearch source is tailored for data migration and replication of OpenSearch clusters. While this is commonly done with snapshots, there are often incompatibilities between snapshots of different versions within or between OpenSearch and Elasticsearch. In combination with Data Prepper’s OpenSearch sink plugin, Data Prepper can now migrate all indexes, or just specific indexes, from one or more source clusters to one or more sink clusters. The OpenSearch source will continually detect new indexes in the source cluster that need to be processed and can even be scheduled to reprocess indexes at a configurable interval to pick up on new documents.

This is a great way to upgrade legacy Elasticsearch 7.x clusters to the latest OpenSearch versions as well as OpenSearch 2.x clusters to Amazon OpenSearch Serverless collections, which do not support native snapshots. Additionally, serverless collections can be specified as the source cluster to replicate and migrate indexes between serverless collections.

For more information, see the OpenSearch source documentation.

Translate processor

Data Prepper 2.5.0 introduces a new translate processor that modifies, or “translates,” a value in incoming events to a different value based on user-configured mappings. For example, you can translate an HTTP status code such as 404 to “Not Found” to make it readable. The processor supports regular expressions, number ranges, and comma-delimited values as mapping keys. Users also have the flexibility to define the mappings either directly in the pipeline configuration or through a file on the local machine or in a remote Amazon Simple Storage Service (Amazon S3) bucket.

Dissect processor

Data Prepper provides a suite of tools for parsing data of various types and structures, including the csv processor, parse_json processor, key_value processor, and Grok processor. In Data Prepper 2.5.0, we are introducing a new addition to this collection: the dissect processor. The dissect processor uses a predefined pattern to extract individual fields from log messages. It shares similarities with the Grok processor in terms of field extraction but is faster and simpler (regular expressions are not needed) in cases where each log line has the same set of fields separated by delimiters.

Other improvements

Data Prepper 2.5.0 includes a number of other improvements. We want to highlight a few of them.

  • The OpenSearch sink now supports update, upsert and delete actions for bulk operation in additions to the existing create and index actions. Actions can also be specified with a condition to determine when to take which type of actions.
  • The key_value processor now supports writing parsed values to the root of event and adding tags to metadata when parsing fails.

Thanks to our contributors!