AWS Database Blog

Part 1: Audit Aurora PostgreSQL databases using Database Activity Streams and pgAudit

In light of today’s burgeoning data growth, the need for storing, sharing, protecting, and using data is becoming more challenging and complicated.

Many enterprise organizations have mandates to comply with database regulatory and compliance requirements, which are of paramount importance. Storing relevant data from an auditing standpoint is another important requirement.

In this post, we discuss two ways to audit Amazon Aurora PostgreSQL-Compatible Edition databases: the Database Activity Streams feature and pgAudit extension. We review the advantages, limitations, and other considerations for each method. In Part 2 of this post, we discuss the steps to implement a database auditing solution using either the option.

Overview of database auditing

In the last few years, the number and complexity of regulations that businesses are required to comply with has increased significantly. Industry-specific compliances include GDPR, SOX, PCI DSS, and HIPAA. The value of fines that have been issued in light of breaches has also increased, making auditing more important than ever.

It’s important that your IT department provides assistance to the business stakeholders by ensuring they receive near real-time reporting for any business intelligence reports. This means putting systems in place to automate reporting and setting up alerts that are triggered when key events occur that require closer attention.

In most cases, it’s compliance that drives database auditing. An organization might audit user access, security access, and other auditable data. You need a mechanism or system in place to perform the following functions:

  • Store near-real-time data
  • Monitor and audit database activity
  • Analyze data
  • Trigger alerts and notify stakeholders of suspicious database activity
  • Integrate with third-party compliance applications such as Imperva’s SecureSphere Database Audit and Protection, McAfee’s Data Center Security Suite, or IBM’s Infosphere Guardium

We can achieve these compliance requirements either with services or scripts outside of the database or by using certain database-native mechanisms. Both these approaches have their pros and cons. In this post, we compare using Database Activity Streams, an Aurora feature using logs, and pgAudit, an engine extension.

When performing database auditing, you should consider the following:

  • Performance overhead of enabling the audit
  • Storage requirements for storing audit logs depending on the granularity of logging
  • Securing from manipulation of logs after the fact
  • Cost implications, if any
  • Ease of installation, maintenance, and monitoring

Audit through Database Activity Streams

The Database Activity Streams feature provides a near-real-time stream of the activity in your relational database. When you integrate Database Activity Streams with monitoring tools, you can monitor and audit database activity.

This solution has the following advantages:

  • Provides a near real-time stream
  • Integrates with native AWS services and third-party monitoring tools
  • Captures database activities in detail
  • Separates duties between DBAs and security personnel
  • Encrypts audit data at rest with an AWS Key Management Service (AWS KMS) customer-managed key
  • Provides audit data in unified JSON format

We can choose between two auditing modes:

  • Asynchronous – In asynchronous mode, the activity stream event is made durable in the background while the database generates the event and the session returns to normal activities immediately. Asynchronous mode favors database performance over the recording of the activity stream. i.e. In case of heavy workload situations, asynchronous mode might skip capturing a few records in the activity stream prioritizing database performance over the stream. Hence the recorded stream might lose some transactions still ensuring integrity of the recorded stream.
  • Synchronous – In case of Synchronous mode, the database session blocks all other activities until the records are captured in the activity stream i.e. the event is made durable. This is useful where capturing of records in the activity streams holds higher priority over performance.

There’s a trade-off between database performance vs. durability for both modes. So if the requirement is to capture all the records in activity stream irrespective of any possible impact on the database performance, synchronous mode is preferred.

Considerations and limitations

When choosing to audit using Database Activity Streams, consider the performance impact and cost. Activating the service doesn’t incur any additional cost. However, the streams have to be stored in Amazon Simple Storage Service (Amazon S3) or any other storage solution which will incur cost. We also have an option to convert the records from JSON to Apache Parquet or Apache ORC format, which is typically more efficient to query than JSON, however, it incurs format conversion costs, details for which can be found here.

Additionally, if the database cluster has multiple user databases, activating the stream captures activities for all the databases in the cluster. If user wants activities captured for a specific database then an additional filter can be added on the captured streams with the help of AWS Lambda.

Audit using pgAudit

The PostgreSQL Audit Extension (pgaudit) provides detailed session and object audit logging through the standard PostgreSQL logging facility.

Basic statement logging can be provided by the standard logging facility either with log_statement = all or mod or ddl. In addition, you can use log_connections and log_disconnections for logging user connection and disconnection details, respectively.

This is acceptable for monitoring use cases, but doesn’t provide the level of detail generally required for an audit. It’s not enough to have a list of all the operations performed against the database. The auditor also needs to find particular statements that are of interest. The standard logging facility shows what the user requested, whereas pgAudit focuses on the details of what happened while the database was satisfying the request.

pgAudit allows you to set auditing on the instance level, user level, or database level:

  • To set auditing on the instance level, change the value of pgaudit.log parameter in the DB parameter group
  • To set auditing on the database level, use ALTER DATABASE dbname set pgaudit.log='All';
  • To set auditing on the user level, use ALTER ROLE username set pgaudit.log='ROLE,WRITE,DDL,FUNCTION';

The following are the formats available in pgaudit:

  • AUDIT_TYPE – SESSION or OBJECT.
  • STATEMENT_ID – The unique statement ID for this session. Each statement ID represents a backend call. Statement IDs are sequential even if some statements are not logged. There may be multiple entries for a statement ID when more than one relation is logged.
  • CLASS – For example, READ or ROLE.
  • COMMAND – For example, ALTER TABLE or SELECT.
  • OBJECT_TYPE – TABLE, INDEX, VIEW, and so on. Available for SELECT, DML, and most DDL statements.
  • OBJECT_NAME – The fully qualified object name (for example, public.account). Available for SELECT, DML, and most DDL statements.
  • STATEMENT – The statement run on the backend.
  • PARAMETER – If pgaudit.log_parameter is set, then this field contains the statement parameters as quoted CSV, or none if there are no parameters. Otherwise, the field is not logged.

You specify which classes of statements are logged by session audit logging. Possible values include the following:

  • READ – SELECT and COPY when the source is a relation or a query
  • WRITE – INSERT, UPDATE, DELETE, TRUNCATE, and COPY when the destination is a relation
  • FUNCTION – Function calls and DO blocks
  • ROLE – Statements related to roles and privileges: GRANT, REVOKE, CREATE/ALTER/DROP ROLE
  • DDL – All DDL that is not included in the ROLE class
  • MISC – Miscellaneous commands, such as DISCARD, FETCH, CHECKPOINT, VACUUM, and SET
  • MISC_SET – Miscellaneous SET commands, for example SET ROLE
  • ALL – Include all of the above

Considerations and limitations

When choosing pgAudit, consider the performance impact and cost, keep in mind that AUTOVACUUM and AUTOANALYZE are not logged. Additionally, statements that run after a transaction enters a cancelled state aren’t audit logged. However, the statement that caused the error and any subsequent statements run in the cancelled transaction are logged as errors by the standard logging facility.

If you need to capture additional database activity information in the logs for troubleshooting and performance tuning, you can use native database logging. For more information, check Working with RDS and Aurora PostgreSQL logs: Part 1 and Part 2.

Conclusion

In this post, we discussed the need for database auditing from a compliance perspective. We presented two auditing options available for databases on AWS: the Database Activity Streams feature in Aurora, and the pgAudit extension.

In Part 2, we discuss two use cases and how to implement a database auditing solution using either Database Activity Streams or pgAudit.

If you have any comments or questions about this post, please share them in the comments.


About the Author

HariKrishna Boorgadda is a Senior Consultant with the Professional Services teamat Amazon Web Services. He focuses on database migrations to AWS and works with customers to design and implement Amazon RDS and Aurora architectures.

Swanand Kshirsagar is a Lead Consultant with Professional Services team at Amazon Web Services. He works with customers to build scalable, highly available and secure solutions in AWS cloud. His focus area is homogenous and heterogeneous migrations of on-premise databases to AWS RDS and Aurora PostgreSQL.

Rajesh Madiwale is a Lead Consultant with Amazon Web Services. He has deep expertise on database development and administration on Amazon RDS for PostgreSQL, Aurora PostgreSQL, Redshift, MySQL and Greenplum databases. He is an ardent member of the PostgreSQL community and has been working on PostgreSQL his entire tenure. He has also delivered several sessions at PostgreSQL conferences.