Example AWS Architecture: Streaming data to events

This repo contains a complete, working architecture on AWS that will take in streaming data, store it in S3 and DynamoDB, and then emit events with their contents.

The example intends to be a valid and truthful, yet not completely minimal, example of this pattern. The state of this work is "solid", but can (and should!) be improved and adapted for the actual needs you might have. Please see the Improvements and TODO sections for concise ideas and open suggestions.

Please note that with the trivial database structure in use, your might get database records that get overwritten if they are written at the the same epoch time. You can fix this by updating to a more resilient table format, for example by using a range key too.

How it is structured

The repo consists of three separate services (or stacks, if you will):

ingest: The main solution that handles intake of data and output of events.
consumer: A basic demo application that can be triggered by incoming events of a given type, as well as via a regular API Gateway.
generator: A utility application to generate test data and send it to the ingest solution, either using Kinesis Firehose or SQS.

This README contains the general project information. Please refer to the READMEs of the various services for more details on any specifics of the above.

How it works

Ingest context
- Take in streaming data via Kinesis Firehose (using "direct PUT events")
- Transform the data with Lambda (Transformer function)
- Automatically dump the transformed data into an S3 bucket ("data lake")
- Place the transformed data into a first-in-first-out SQS queue
- The queue will trigger a Lambda (EventStorer function) that will store the data in DynamoDB, similar to an event store
- DynamoDB has the Streams feature turned on, which will trigger a third Lambda (IntegrationEventEmitter function) on any database changes
- The integration event is emitted through EventBridge on a custom event bus
Consumer context
- A basic "consumer" service using Lambda (EventResponder function) will be triggered by the integration event
- API Gateway is also deployed in front of the service, so users can access the function/service

Properties of the solution

Well-known, high-scaling components
Mix of eventual (DynamoDB, Lambda) and strong consistency (Kinesis Firehose, FIFO queue)
Input constraint is equal to the limits on Kinesis Firehose
Output constraint is equal to the limits on EventBridge

Tooling

This project uses:

Terraform to describe (and be able to deploy) the infrastructure as code
Serverless Framework to describe (and be able to deploy) the application parts and services as code
TypeScript to write most of the application code
(Optional) Checkov annotations can be seen in the Terraform modules; not strictly required

Diagram

See the below diagram for how the overall flow moves through the various AWS components.

Prerequisites

Amazon Web Services (AWS) account with sufficient permissions so that you can deploy infrastructure. A naive but simple policy would be full rights for at least CloudWatch, Lambda, API Gateway, DynamoDB, X-Ray, SQS, Kinesis, EventBridge, and S3.
Recent Node version installed, ideally version 16 or later.
Terraform installed.
Ideally Terraform Cloud (local or remote workspace). If you're not using Terraform Cloud, remove the cloud block from ingest/infra/main.tf.

Configuration

Before making any changes to names and such, please do a "find-all" in the repo to find references to any values you might want to change. Search especially for 123412341234 and replace it with your AWS account number.

You should take a look at the Terraform modules (under ingest/infra/) and see that any settings, values, and regions are correct and/or optimized for your needs.

Also, if you're not using Terraform Cloud, remove the cloud block from ingest/infra/main.tf.

Deployment order

Deploy ingest
Deploy the Terraform infrastructure in ingest/infra
Deploy consumer
Run generator

Improvements and TODO

The use of Base64/ASCII/UTF8 conversions across the events/functions can likely be improved a lot
generator/index.js - The current way of turning data into an ArrayBuffer results in garbled data.
ingest/infra/modules/kinesis/main.tf - Customer-managed KMS?
ingest/infra/modules/s3/main.tf - Multi-region replication? TTL, S3 Glacier?
ingest/infra/modules/sqs/main.tf - Add DLQ (Dead Letter Queue)?
ingest/infra/modules/dynamodb/main.tf - Replicas (Global Tables) seem to create problems? (currently turned off)
ingest/src/Transformer/adapters/web/Transformer.ts - Ensure Kinesis config adds newline separators; see https://aws.amazon.com/blogs/big-data/build-seamless-data-streaming-pipelines-with-amazon-kinesis-data-streams-and-amazon-kinesis-data-firehose-for-amazon-dynamodb-tables/
ingest/src/EventStorer/adapters/web/EventStorer.ts: Might be faster to run const dynamo = createNewDynamoRepository() in global scope?
Tests?
Take incoming events and copy/store them as test data
DynamoDB: key field should have something else inserted into it?
DynamoDB: Use range key for timestamp?
EventBridge: Use "Resources" field?
EventBridge: Emit more events or for more details ("systems")?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
consumer		consumer
diagrams		diagrams
generator		generator
ingest		ingest
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consumer

consumer

diagrams

diagrams

generator

generator

ingest

ingest

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Example AWS Architecture: Streaming data to events

How it is structured

How it works

Properties of the solution

Tooling

Diagram

Prerequisites

Configuration

Deployment order

Improvements and TODO

References

About

Releases

Languages

mikaelvesavuori/example-aws-stream-data-to-events

Folders and files

Latest commit

History

Repository files navigation

Example AWS Architecture: Streaming data to events

How it is structured

How it works

Properties of the solution

Tooling

Diagram

Prerequisites

Configuration

Deployment order

Improvements and TODO

References

About

Topics

Resources

Stars

Watchers

Forks

Languages