Skip to content

Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB. It allows you to run bounded aggregation queries against Amazon Keyspaces and DynamoDB services.

License

aws-samples/amazon-aggregation-query-proxy

Aggregation-query-proxy

There are use-cases when you need to aggregate bounded result sets for a short time period from DynamoDB or Keyspaces, for example, an hourly or daily report, including all hourly or daily sales. However, Amazon DynamoDB and Keyspaces do not support the commonly seen SQL aggregation constructs such as COUNT, SUM, MIN, MAX, and GROUP BY, as aggregation queries with unbound number of partitions might take unpredictable time to execute. Because of this constraint, it is better to preprocess the operational data to do the aggregation, and storage of the processed data in Amazon DynamoDB/Keyspaces. This pattern provides a solution by placing a scalable aggregation proxy (sidecar) between your application and DynamoDB/Keyspaces.

The aggregation-query-proxy (AQP) consists of a scalable proxy layer that sits between your application and Amazon Keyspaces/DynamoDB.

It provides intermediate aggregation logic which allows existing application to execute aggregation queries against Amazon DynamoDB/Keyspaces.

The AQP converts the provided aggregation query (SQL-92) to a plain request (CQL/DDBPartiQL). After the plain response (json) has been received the AQP uses IonEngine to aggregate the plain response into the final result set in json format.

alt text

Create your yaml based on the template:

cp conf/keyspaces-aggregation-query-proxy.yaml.template conf/keyspaces-aggregation-query-proxy.yaml

Configure and build the app with Amazon Keyspaces

Set dataBaseName to KEYSPACES Set pathToKeyspacesConfigFile to /usr/app

Configure DataStax conf file

Prepare DataStax java driver conf file

Build the project

mvn install build.sh <AWS_ACCOUNT> <REGION>

Start a docker container with the app

docker run -it -p 8080:8080 simple-aggregation-query-app

Limitations

As a best practice we recommend executing bounded Amazon Keyspaces (CQL) or DynamoDB (PartiQL) requests against the Aggregation Query Proxy. In all cases, avoid unbounded aggregations queries (without WHERE clause). Unbounded aggregation queries might lead to unpredictable execution time, high JVM memory pressure on AQP nodes (OOM), or high Amazon DynamoDB/Keyspaces RCUs consumption.

Configure and build the app with Amazon DynamoDB

Set dataBaseName to DYNAMODB Set dynamoRegion to us-east-1

Build the project

mvn install build.sh <AWS_ACCOUNT> <REGION>

Start a docker container with the app

docker run -it -p 8080:8080 --env AWS_REGION="us-east-1" --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY --env AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN simple-aggregation-query-app

Let's execute a simple aggregation query against Amazon Keyspaces

AUTH_BASIC=$(echo -n large-query-app:your_secret | base64)

http --follow --timeout 3600 GET 'http://0.0.0.0:8080/query-aggregation/select count(book_title) as books, award, avg(rank) as avg_rang from keyspaces_sample.keyspaces_sample_table GROUP BY award' Authorization:'Basic '$(AUTH_BASIC)

HTTP/1.1 200 OK Cache-Control: no-transform, max-age=60 Content-Encoding: gzip Content-Length: 195 Content-Type: application/json Date: Wed, 04 May 2022 01:59:16 GMT Vary: Accept-Encoding

{
    "response": [{"resultSet":[{"books":3,"award":"Kwesi Manu Prize","avg_rang":2e0},
                                 {"books":3,"award":"Richard Roe","avg_rang":2e0},
                                 {"books":3,"award":"Wolf","avg_rang":2e0}]}],
    "stats": {
        "elapsedTimeToAggregateDataInMs": 361,
        "elapsedTimeToRetrieveDataInMs": 120,
        "payloadSizeBytes": 626
    }
}

Let's execute a simple aggregation query against Amazon DynamoDB

http --auth-type basic -a large-query-app:secretEXAMPLE --follow --timeout 3600 GET 'http://0.0.0.0:8080/query-aggregation/select zipcode,pk,sum(amount) as total from "your_table" where pk in (%27ACCOUNT%23ACCOUNT40%23CUSTOMER%23CUSTOMER33%27, %27ACCOUNT%23ACCOUNT1%23CUSTOMER%23CUSTOMER61%27) group by zipcode, pk'

HTTP/1.1 200 OK Cache-Control: no-transform, max-age=60 Content-Encoding: gzip Content-Length: 229 Content-Type: application/json Date: Tue, 14 Jun 2022 14:57:15 GMT Vary: Accept-Encoding

{
    "response": [
        {
            "resultSet": [
                {
                    "pk": "ACCOUNT#ACCOUNT1#CUSTOMER#CUSTOMER61",
                    "total": 133.9894140356888,
                    "zipcode": 74545
                },
                {
                    "pk": "ACCOUNT#ACCOUNT40#CUSTOMER#CUSTOMER33",
                    "total": 4321.055836431855,
                    "zipcode": 56624
                }
            ]
        }
    ],
    "stats": {
        "elapsedTimeToAggregateDataInMs": 1210,
        "elapsedTimeToRetrieveDataInMs": 1346,
        "payloadSizeBytes": 194
    }
}

Local DynamoDB for JUnit test

Set localDDB to true to run all JUnit tests against DynamoDB

License

This project is licensed under the MIT-0

About

Aggregation query proxy is a scalable sidecar application that sits between a customer application and Amazon Keyspaces/DynamoDB. It allows you to run bounded aggregation queries against Amazon Keyspaces and DynamoDB services.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published