Front-End Web & Mobile

How to connect your GraphQL API to AWS data sources

What’s a resolver?

A GraphQL Resolver is a function or method that resolves a value for a type or field within a schema.

A resolver is the key architectural component that connects GraphQL fields, graph edges, queries, mutations, and subscriptions to their respective data sources and microservices. In this post, we’ll focus on how to build GraphQL resolvers for AWS data sources, such as Amazon DynamoDB, Amazon Aurora, and Amazon Relational Database Service (Amazon RDS) or compute with AWS Lambda. Custom resolvers extend the default GraphQL options that many options, such as Apollo Server, Express GraphQL, GraphQL Java, Hot Chocolate .NET, and AWS AppSync, a fully managed GraphQL service, provide.

Self-Managed GraphQL Servers

Although many popular open source GraphQL servers exist, in this post we’ll use Apollo Server as a reference point for building resolvers with JavaScript (or TypeScript) to connect a self-managed, open source GraphQL server to different AWS data sources.

When using JavaScript, the resolvers at their most minimal look something like the following for a query resolver:

const resolvers = {
    Query: {
        hello: () => { return 'Hello world!'},
    },};    

For a mutation resolver, the code would follow something like this:

Mutation: {
    login: async (_, { email }, { dataSources }) => {
        const user = await dataSources.userAPI.findOrCreateUser({ email });

        if (user) {
            user.token = Buffer.from(email).toString('base64');
            return user;
        }
    },},

This example around a login scenario shows how we can immediately see a resolver becoming more complex. Moreover, we can see how you might need it to reach out to other systems, call other functions, and perform other calls to build the return object for the query or mutation.

Now let’s take the next steps and discuss our AWS data source options.

DynamoDB

When using some tooling, libraries, and other AWS SDKs, you will be presented with primitives to build out driver access. For some data source connections, building out a solution on these primitives is a great option. However, for AWS data sources, use the SDK for your connections. This is especially true for DynamoDB, as the SDK provides many features for interacting with your DynamoDB tables and data.

In the following snippet, we can see the Apollo Library and SDK in use to perform several key steps: configuration for the connection, instantiation of the DocumentClient, and then run  these commands for a put against the database.

var AWS = require("aws-sdk");

AWS.config.update({
  region: "us-west-2",
  endpoint: "http://wherever.your.location/is"
  // Security keys, tokens, etc, will need setup and stored securely for the connection.
})

var docClient = new AWS.DynamoDB.DocumentClient()

var params = {
    TableName: "MoviesTable",
    Item:{
        "year": 2025,
        "title": "The Big New Movie"
    }
}

Mutation: {
    createMovie: async (_, { email }, { dataSources }) => {
        docClient.put(params, function(err, data) {
            if (err) {
                // Item failed, handle error and respond with pertiment response.
                HANDLE.error("Unable to add item. Error JSON:", JSON.stringify(err, null, 2))
                return err
            } else {
                // Item added here.
                HANDLE.log("Added item:", JSON.stringify(data, null, 2))
                return data 
            }
        })
    },
}

Further key resources to build out your access and usage plan include the DynamoDB Documentation Developer Guide and the Apollo Documentation.

Amazon RDS and Aurora

For relational databases, Amazon RDS and Aurora are great options. These provide many options with support for Postgres, MySQL, SQL Server, and others as the SQL engines. The following example shows what an Apollo resolver would look like with a PostgreSQL database using the pg client for Node.js and the SDK.

const { Pool } = require('pg')
const { RDS } = require('aws-sdk')

const signerOptions = {
  credentials: {
    accessKeyId: process.env.youraccesskey,
    secretAccessKey: process.env.yoursecretaccesskey,
  },
  region: 'us-east-2',
  hostname: 'example.hostname.us-east-2.rds.amazonaws.com',
  port: 5432,
  username: 'postgres-api-account',
}

const signer = new RDS.Signer()
const getPassword = () => signer.getAuthToken(signerOptions)

const pool = new Pool({
  host: signerOptions.hostname,
  port: signerOptions.port,
  user: signerOptions.username,
  database: 'my-db',
  password: getPassword,
})

var insertQuery = 'INSERT INTO MoviesTable(title, year) VALUES($1, $2) RETURNING *'
var queryValues = ['The Big New Movie', '2025']

Mutation: {  
    createMovie: async (_, { email }, { dataSources }) => {
        res = await pool.query(insertQuery, queryValues)
        // Handle response errors, data, and other processing here.
        if (err) {
            // Item failed, handle error and respond with pertiment response.
            HANDLE.error("Unable to add item. Error JSON:", JSON.stringify(err, null, 2));
            return err
        } else {
            // Item added here.
            HANDLE.log("Added item:", JSON.stringify(data, null, 2));
            return data 
        }
    },  
}

pool.query(insertQuery, queryValues)

This example adds several things specific to Amazon RDS and PostgreSQL. The connection pooling that a relational database does – whether PostgreSQL, MySQL, SQL Server, or another – is an important part of the connection that must be managed.

For more information about writing queries against Amazon RDS and Aurora, check out the documentation on Amazon RDS Proxy. And, for more on PostgreSQL pooling for pg, check out the Pooling documentation.

Lambda

The next option isn’t a data source specifically, but it could provide a bridge to almost any data source. Connecting and invoking a Lambda function follows several steps. The following example looks at the key parts of making a Lambda call from a resolver using the SDK.

Here, the parameters are built up and passed to the Lambda function and the call within the resolver.

var AWS = require("aws-sdk");

AWS.config.update({
  region: "us-west-2",
  endpoint: "http://wherever.your.location/is"
  // Security keys, tokens, etc, will need setup and stored securely for the connection.
});

const params = {
    FunctionName: 'my-lambda-function', 
    Payload: queryValues,
};

Mutation: {   
    createMovie: async (_, { email }, { dataSources }) => { 
        const result = await (new AWS.Lambda().invoke(params).promise());

        if (err) {   
            // Act on the err here to pass the err.stack back as an appropriate   
            // GraphQL error.  
            return err   
        } else {   
            const res = await pool.query(insertQuery, queryValues)   
            // Act on the response here.   
            return data   
            }   
        })   
    },   
}

Tips for Building Resolvers for Self-Managed GraphQL Servers

Use the SDK – Lean heavily on the SDK for all of the calls to AWS resources. This will help you make sure that you have a consistent, reliable, and maintained access method for your API and the respective data sources.

For example, the SDK provides a more consistent way to setup configuration; pass secrets for connections; handle errors, retries, and exponential back off; and other important aspects across your code base. Using the SDK also makes it easier to set up a particular access style, such as async await, promises, or callbacks. There are many additional reasons to use the SDK beyond these immediate advantages.

var AWS = require("aws-sdk");

AWS.config.update({
  region: "us-west-2",
  endpoint: "http://wherever.your.location/is"
  // Security keys, tokens, etc, will need setup and stored securely for the connection.
});

Naming, Parameterization, and Clear Code – Name all of the operations, parameters, and other passed code mapped with their respective database queries across their layers of concern.

For example, if the GraphQL query is to get movies, and it looks like this example, then make sure that the respective query on the database side matches the names or naming. The table in the database should be named movie or movies, depending on the convention. Moreover, this all leads to a better understanding and readability when determining which GraphQL query belongs to which database query, database table, and so on. This prevents the multiple layers of GraphQL query or mutation and entity with fields from getting confused or out of synchronization with the database. Furthermore, when options present themselves, this provides an easier path to automation if code generation or other technologies are used to create your resolvers.

query GetMovies{
    movies {
        title
        year
    }
}

Request and Response Connections, Pooling, and Fire and Forget – Determine a strategy and stick to that approach to maintain a clear tactical understanding of what behavior to expect from the API.

For example, a good practice to follow is writing code against a determined (based on desired outcomes) connection method. If a database is setup and you intend to use something like Amazon RDS Proxy to implement connection pooling, then make sure that you write driver code for the GraphQL resolvers that matches that architectural decision. If a client connection must fire and forget, then take that into account. Otherwise, if a client must set up a connection and interact with multiple queries per resolver, then that must be part of the design patterns used. This prevents resolvers from being written that deviate from expected usage, which would result in few to no errors or warnings and an almost impossible situation to debug.

Determine Authorization Upfront – Determine what strategy will be used, what the expected results and behavior are, and what tooling will be used before starting to design your API.

  async theUser(_, args, context) {
    if (!context.isAuthenticated) {
        // Handle not authenticated.
    }
    if (!context.isInRole(role)) {
        // Handle not being in role.
    }

For example, if you want row or field level authorization of data in your API, then this decision must be known to make the correct decisions about tooling. For some scenarios, using Amazon Cognito might be perfect, for others perhaps just a simple authentication mechanism, and for others still, something completely different might be needed. Deciding on this up front lets you make the right tooling, such as using GraphQL directives etc., and it will prevent project restarts.

Query Only Selected Fields and Generate Your SQL/Query Side – Determine how to make sure that only the fields being requested are being queried for, and that they only return what is asked for.

For example, if your query read like the following, then you’re only asking for field1 and field2 of theEntity to be returned.

query anExtremelyMinimalQuery {
    theEntity {
        field1
        field2
    }
}

If the database must pull data from a singular table, then the SQL query would look something like the following.

SELECT field1, field2 FROM theEntityTable

However, if a join must occur, then it might look like this:

SELECT field1.firstEntity, field2.secondEntity 
FROM firstEntity 
INNER JOIN secondEntity ON firstEntity.ID = secondEntity.ID

In these cases, these queries must be generated or written that only include the needed elements for the GraphQL query. Furthermore, ideally they’re written or generated in a way that would be performant. Not doing so can lead to performance issues, or the retrieval of the data and it just being unused, which can set up an inefficient GraphQL query or mutation.

Managed GraphQL APIs with AWS AppSync

As an alternative to self-managed GraphQL servers, managed solutions provide a great way to shift many operational concerns, as well as reorient focus toward organizational use cases. A managed solution such as AWS AppSync means that the infrastructure, patching, scaling, availability, scalability, and other operational concerns are managed by AWS AppSync. Therefore, your team can focus its resources and developers on the organizational use case – business rules, processing, transactions, and e-commerce, and related work.

AWS AppSync provides a serverless GraphQL API with optimized resolvers to connect to AWS data sources, compute, and others. Now, let’s take the next steps and discuss our AWS data source options.

DynamoDB

A schema can be used to generate resolvers, which in AWS AppSync are called Unit Resolvers. These provide a direct resolver connection to DynamoDB. These resolvers run actions against the database, thereby providing a singular response and request mapping template using Apache Velocity Template Language (VTL). This takes the request as inputs, and then outputs a JSON document with instructions for the resolver. To learn more about Unit Resolvers, work through the how to configure resolvers developer guide.

For more on VTL use in AWS AppSync, check out the “Resolver Mapping Template Programming Guide“, and learn about alternatives to VTL with Lambda resolvers discussed later in this post.

If you need multiple operations to be run against one of multiple data sources in a single client request, then change the unit resolver to a Pipeline Resolver. These Pipeline Resolver operations are run in order against your data source(s). Furthermore, you can create functions to run during these operations. This opens up a wide range of sequential operational options during the execution of your resolvers. To dive deeper, check out this tutorial on Pipeline Resolvers.

Amazon RDS and Aurora

AWS AppSync also provides resolver options using Apache VTL (Velocity Template Language) to connect to Aurora Serverless. For example, an insert of data, with the respective return data for the GraphQL response, would look like the following. This code would be added to the request mapping template.

#set($id=$utils.autoId())
{
    "version": "2018-05-29",
    "statements": [
        "insert into Pets VALUES ('$id', '$ctx.args.input.type', $ctx.args.input.price)",
        "select * from Pets WHERE id = '$id'"
    ]
}

Then, for the response mapping template, the following code would complete the resolver:

$utils.toJson($utils.rds.toJsonObject($ctx.result)[1][0])

For an example of connecting resolvers to Aurora Serverless options, build a pet database in this tutorial.

Direct Lambda Resolver

The Direct Lambda resolvers provide a way for the AppSync service to call Lambda functions. These resolvers can call a Lambda Function with initiating request template mapping VTL and correlating response template mapping. However both can be turned off and the resolver will use the context object as convention removing the need for request and response mapping VTL. The Context object is passed directly to a Lambda Function as an Invoke operation. This alleviates the need for any VTL and you can use whatever language Lambda supports to connect to stand alone RDS Database Servers, Amazon Neptune, Amazon Kinesis, or any number of other sources of processing or data storage.

For further details on the Direct Lambda resolvers check out the developers guide on configuring a Direct Lambda Resolver. For a tutorial on invoking a Lambda Function with a standard Lambda Resolver give building out resolvers for AppSync with Lambda functions a read.

Tips for AWS AppSync

  1. Resolver Level Caching – Turning on per-resolver caching provides resolver specific settings for arguments, source, and identity maps on which to base the cache hits. This provides another level of caching beyond full request caching that lets each resolver cache based on its specific data requests to servers.
  2. HTTP Resolvers as HTTP Actions – A popular capability to pair with GraphQL APIs is the ability to issue an HTTP action as a request, and then build the response based on that action call. Use an HTTP Resolver to accomplish this task based on your API needs. This can provide a means to connect to other APIs via whichever mechanism, such as GraphQL, REST, or other options.
  3. Avoid VTL with Lambda Resolvers – If you have a particular language or functionality that you need for an API, then using Lambda Resolvers introduces the option to use different support languages to connect with any type of data source. These include Amazon RDS and Postgres or Aurora, and it will work with those results as needed. Check out the article “Introducing Direct Lambda Resolvers: AWS AppSync GraphQL APIs without VTL” for a deep dive.
  4. Amplify Resolver Autogeneration – An excellent way to try out AWS AppSync is to use the Amplify Studio or Amplify CLI tooling to autogenerate your GraphQL API. This includes all VTL resolver code based on a schema. Additional tooling with Amplify Studio provides a way to build out a schema graphically. Furthermore, it can draw relationships with immediate schema and the ability to deploy the API with fully functional resolvers built.
  5. Custom Authentication with Lambda Resolvers – There are many authentication options with AWS AppSync, including Amazon Cognito, OIDC, IAM, and API Keys. Moreover, if a custom authentication option is needed, then using Lambda Resolvers provides this option to connect and authenticate, or authorize data consumers against the API.

Key Differences Between Resolver Development Options

Categories Self-Managed, Open Source GraphQL Servers AWS AppSync Managed GraphQL APIs
configuration Many of the secrets for connection, configuration of databases, and other criteria must be managed. This would require a secrets vault or other system of keeping secrets, and configurations managed across environments. AWS AppSync means that your secrets are already managed across environments and require minimal interaction from the developer. In many ways, these data sources in an AWS AppSync API are all seamlessly integrated and the developer can focus more on the model and organizational use case.
data Full control over the exact response and request of data inbound and outbound from your database. However, the downside is each function can become a costly development process. Managed solution means that you can not only control the request and response cycles with the various tooling, but also gain the immediacy of generated query calls into your database. This provides a significant boost toward focusing on organizational use cases.
developer Self-hosted solutions, such as Apollo, can provide the most extensive flexibility around implementing code, database, and models for your GraphQL API. A managed solution will provide a dramatically faster onramp for the deployment of your API, and it will streamline much of the process. However, it will be limited if more elaborate and complex coding is needed.
logging For a self-hosted solution you must determine exactly what was needed, and then build out that solution, connecting it to the respective GraphQL Server that you’ve decided on. AWS AppSync uses CloudWatch to provide an easy way to simply turn on logging. Tracing can also be turned on with AWS X-Ray to bolster your insight and information in the system.
cost A self-hosted solution introduces a range of additional costs, including servers, functions, individual resources, possible additional staff, and many others. A managed solution provides a singular line item based on usage only.

Next Steps

Try out AWS AppSync with a quick getting started tutorial. Or, to get directly into testing some custom resolvers, try any of these tutorials:

Several other tutorials are available in the resolver tutorials section with AWS AppSync Resources that you can check out.

Conclusion

Regardless of whether it’s self-hosted or managed, you should focus on the original tenants and advantages that GraphQL provides. For example, resolvers should be built to prevent over-fetching, under-fetching, provide clear dependence graphs between parent and child data, and there should be multiple round trips with single responses. Both self-hosted or managed can provide you with extensive tooling to build out your GraphQL API with these tenants in place. Moreover, the key question should be, “Which option will get you to your desired outcome most effectively, quickly, and with the least impact over time?” Answering that will give you your best choice: self-hosted, managed, or perhaps even both.

About the author

Adron Hall

I’m a jovial, proactive, test & code, code & test, get things done well, software architect, engineer, coder, and distributed systems advocate, while being a technical program manager at Amazon!