Skip to content

aws-samples/ai-powered-text-insights

AI Powered Text Insights

This package includes a sample of a prototype to help you gain insights on how your customers interact with your brand in social media. By combining zero-shot text classification, sentiment analysis, and keyword extraction we are able to obtain real time insights from posts on Twitter and present them in a dashboard. The solution consists of a tweet processing pipeline (using AWS Lambda) that classifies tweets, by calling a serverless SageMaker endpoint running a HuggingFace model, into one of the categories defined at inference time (zero-shot classification). Classified tweets are then processed with Amazon Comprehend to extract sentiment and keywords. Anomaly detection is performed on the volume of tweets per category per period of time using Amazon Lookout for Metrics and notifications are sent when anomalies are detected. All insights are presented on a QuickSight dashboard.

The sample application includes some backend resources (backend directory) and a container that gets tweets from a Twitter stream (stream-getter directory).

Deploy instructions

Deploying the sample application builds the following environment in the AWS Cloud:

architecture

Prerequisites

Backend resources

Run the command below, from within the backend/ directory, to deploy the backend:

sam build --use-container && sam deploy --guided

Follow the prompts. NOTE: Due to a constraint in Lookout for Metrics naming of databases please name you stack using the following regular expression pattern: [a-zA-Z0-9_]+

The command above deploys an AWS CloudFormation stack in your AWS account. You will need the stack's output values to deploy the Twitter stream getter container.

1. Data format

This solution generates the tweets insights, stored as JSON files, into two S3 locations, /tweets and /phrases, on the results bucket whose name is specified by the CloudFormation stack's outputs under "TweetsBucketName". Under /tweets and /phrases folders data is organized by day following the YYYY-MM-dd 00:00:00 datetime format.

Sample output files can be found in this repository under the /sample_files folder.

2. Activate the Lookout for Metrics detector

To allow for you to provide historical data to the anomaly detector to reduce the detector’s learning time the prototype is deployed with the anomaly detector disabled.

If you have historical data with the same format as the data generated by this solution you may move it to the data S3 bucket generated by deploying the backend (TweetsBucketName). Make sure to follow the format of the files in the /sample_files folder.

Follow the instructions to activate your detector, the detector’s name can be found as part of the CloudFormation stack’s outputs.

Optionally you can configure alerts for your anomaly detector. Follow the instructios to create an alert that sends a notification to SNS, the SNS topic name are part of the CloudFormation stack’s outputs.

Twitter stream getter container

Run the command below, from within the stream-getter/ directory, to deploy the container application:

1. Create application

copilot app init twitter-app

2. Create environment

copilot env init --name test --region <BACKEND_STACK_REGION>

Replace <BACKEND_STACK_REGION> with the same region to which you deployed the backend resources previously.

Follow the prompts accepting the default values.

The above command provisions the required network infrastructure (VPC, subnets, security groups, and more). In its default configuration, Copilot follows AWS best practices and creates a VPC with two public and two private subnets in different Availability Zones (AZs). For security reasons, we'll soon configure the placement of the service as private. Because of that, the service will run on the private subnets and Copilot will automatically add NAT Gateways, but NAT Gateways increase the overall cost. In case you decide to run the application in a single AZ to have only one NAT Gateway (not recommended), you can run the following command instead:

copilot env init --name test --region <BACKEND_STACK_REGION> \
    --override-vpc-cidr 10.0.0.0/16 --override-public- cidrs 10.0.0.0/24 --override-private-cidrs 10.0.1.0/24

Note: The current implementation is prepared to run one container at a time solely. Not only your Twitter account should allow you to have more than one Twitter's stream connection at a time, but the application also must be modified to handle other complexities such as duplicates (learn more in Recovery and redundancy features). Even though there will be only one container running at a time, having two AZs is still recommended, because in case one AZ is down, ECS can run the application in the other AZ.

3. Deploy the environment

copilot env deploy --name test

4. Create service

copilot svc init --name stream-getter --svc-type "Backend Service" --dockerfile ./Dockerfile

5. Create secret to store the Twitter Bearer token

copilot secret init --name TwitterBearerToken

When prompted to provide the secret, paste the Twitter Bearer token.

6. Edit service manifest

Open the file copilot/stream-getter/manifest.yml and change its content to the following:

name: stream-getter
type: Backend Service

image:
  build: Dockerfile

cpu: 256
memory: 512
count: 1
exec: true

network:
  vpc:
    placement: private

variables:
  SQS_QUEUE_URL: <SQS_QUEUE_URL>
  LOG_LEVEL: info

secrets:
  BEARER_TOKEN: /copilot/${COPILOT_APPLICATION_NAME}/${COPILOT_ENVIRONMENT_NAME}/secrets/TwitterBearerToken

Replace <SQS_QUEUE_URL> with the URL of the SQS queue deployed in your AWS account.

You can use the following command to get the value from the backend AWS CloudFormation stack outputs (replace <BACKEND_STACK_NAME> with the name of your backend stack):

aws cloudformation describe-stacks --stack-name <BACKEND_STACK_NAME> \
    --query "Stacks[].Outputs[?OutputKey=='TweetsQueueUrl'][] | [0].OutputValue"

7. Add permission to write to the queue

Create a new file in copilot/stream-getter/addons/ called sqs-policy.yaml with the following content:

Parameters:
  App:
    Type: String
    Description: Your application's name.
  Env:
    Type: String
    Description: The environment name your service, job, or workflow is being deployed to.
  Name:
    Type: String
    Description: The name of the service, job, or workflow being deployed.

Resources:
  QueuePolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: SqsActions
            Effect: Allow
            Action:
              - sqs:SendMessage
            Resource: <SQS_QUEUE_ARN>

Outputs:
  QueuePolicyArn:
    Description: The ARN of the ManagedPolicy to attach to the task role.
    Value: !Ref QueuePolicy

Replace <SQS_QUEUE_ARN> with the ARN of the SQS queue deployed in your AWS account.

You can use the following command to get the value from the backend AWS CloudFormation stack outputs (replace <BACKEND_STACK_NAME> with the name of your backend stack):

aws cloudformation describe-stacks --stack-name <BACKEND_STACK_NAME> \
    --query "Stacks[].Outputs[?OutputKey=='TweetsQueueArn'][] | [0].OutputValue"

After that, your directory should look like the following:

.
├── Dockerfile
├── backoff.py
├── copilot
│     ├── stream-getter
│     │    ├── addons
│     │    │     └── sqs-policy.yaml
│     │    └── manifest.yml
│     └── environments
│          └── test
│               └── manifest.yml
├── main.py
├── requirements.txt
├── sqs_helper.py
└── stream_match.py

8. Deploy service

IMPORTANT: The container will connect to the Twitter stream as soon as it starts, after deploying the service. You need your Twitter stream rules configured before connecting to the stream. Therefore, if you haven't configured the rules yet, configure them before proceeding.

copilot svc deploy --name stream-getter --env test

When the deployment finishes, you should have the container running inside ECS. To check the logs, run the following:

copilot svc logs --follow

Visualize your insights with Amazon QuickSight

To create some example visualizations from the processed text data follow the instructions on the Creating visualizations with QuickSight.pdf file.

Rules examples for filtered stream

Twitter provides endpoints that enable you to create and manage rules, and apply those rules to filter a stream of real-time tweets that will return matching public tweets.

For instance, following is a rule that returns tweets from the accounts @awscloud, @AWSSecurityInfo, and @AmazonScience:

from:awscloud OR from:AWSSecurityInfo OR from:AmazonScience

To add that rule, issue a request like the following, replacing <BEARER_TOKEN> with the Twitter Bearer token:

curl -X POST 'https://api.twitter.com/2/tweets/search/stream/rules' \
-H "Content-type: application/json" \
-H "Authorization: Bearer <BEARER_TOKEN>" -d \
'{
  "add": [
    {
      "value": "from:awscloud OR from:AWSSecurityInfo OR from:AmazonScience",
      "tag": "news"
    }
  ]
}'

Clean up

If you don't want to continue using the sample, clean up its resources to avoid further charges.

Start by deleting the backend AWS CloudFormation stack which, in turn, will remove the underlying resources created then delete all the resources AWS Copilot set up for the container application, run the following commands:

sam delete --stack-name <sam stack name>
copilot svc delete --name stream-getter
copilot env delete --name test
copilot app delete

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.