Rahul Mehta
Redshift cluster snapshots

Share AWS Redshift data across accounts

August 3, 2020 by

This article provides a step by step explanation of how to share AWS Redshift database snapshots with other AWS accounts to enable porting of data from one AWS account to another.

Introduction

AWS Redshift is a columnar warehouse service that is often used for massive data aggregation, correlation and can host petabyte-scale data in a clustered model. In a typical SDLC environment on the cloud, different accounts are used for different SDLC environments like Dev, Stage, Test, and Production. Like any other database system, there is a need to port data held in AWS Redshift clusters from one environment to another. As the data held in Redshift clusters can be massive in size, moving this data across multiple accounts can become challenging as well as increase cost and redundancy in other accounts.

One option for sharing data in other accounts is to extract the entire data out of redshift cluster in other services like AWS S3 and then transfer this data using online programmatic methods or offline method by transferring data on an appliance or on-premise location, and re-uploading the same data on the new account. While these methods can still achieve the purpose but are neither scalable nor cost-efficient. Also, by taking the data out of the cluster, the metadata and the model of the database objects may be lost. One of the standard methods followed to transfer data across AWS Redshift clusters within as well as across AWS accounts is by creating snapshots of the cluster and then restoring this snapshot in the cluster of choice. In this article, we will learn a mechanism to address this scenario by using AWS Redshift database snapshots, sharing it with the desired accounts and restoring the same into an Amazon Redshift cluster.

AWS Redshift Setup

In this article, we would start with a working AWS Redshift cluster and it’s assumed that you already have the required data in the cluster that is required to be shared with a different AWS account. Those who are new to AWS Redshift can refer to this article, Getting started with AWS Redshift, to create a new Redshift cluster. Once the cluster is created, it would look as shown below on the Amazon Redshift Clusters page. To simulate the scenario, it’s recommended to create some test data of a reasonable volume so that when the snapshot is created, the size of the volume is large. While it is not necessary to create sample data for this exercise, but you would be able to appreciate the value that this feature provides to sharing large sized backups across AWS accounts compared to other indirect methods of porting data across AWS accounts.

Redshift cluster

In my last article, Managing snapshots in AWS Redshift clusters, we discussed AWS Redshift manual and automated snapshots, which are used for backups and recovery. Snapshots are also a vehicle for moving data from one cluster to another as well. When one cluster restores snapshot of another cluster, data is automatically ported that is held in the backup. We need a mechanism in which snapshots of one Amazon Redshift cluster hosted in one account can be accessed by another Amazon Redshift cluster hosted in a different account. Redshift supports automated as well as manual snapshots, as we discussed in my last article, which would look as shown below in the Snapshots section of the cluster properties.

Redshift cluster snapshots

There may be a need to access data held in the manual as well as automated snapshots in a different AWS account. It is required to create a manual snapshot of an automated snapshot from the Actions menu, as the automated snapshot would get deleted automatically after the retention period. So it’s assumed that either a manual snapshot or a manual snapshot of an automated snapshot is already in place. Click on the manual snapshot to navigate to the details and it would look as shown below. Let’s say that we intend to use the data contained in this snapshot in a different AWS account. For this purpose, we need to make this snapshot accessible to another account.

Redshift cluster snapshot details

Click on the Edit button, and you would find the snapshot settings as shown below. We have the option to provide another AWS account with which we intend to share the snapshot. Assuming you have another account, you can type the 12-digit account id in the Account box under the Manage Access section, click on Add account button and then click on Save to save the modifications. Here this manual snapshot is created in Account-1 as you can see on the top right section of the below figure, and the account to which we are provisioning access, in this case, is Account-2. To access the snapshot from Account-2, we need to log in to Account-2 and navigate to the Snapshots section of AWS Redshift.

Redshift cluster snapshot access management

After logging on to Account – 2, you would be able to find the shared snapshot listed as shown below.

Redshift cluster shared snapshot details

Click on the Actions menu to see the list of actions that can be performed on this snapshot. Not all the actions shown in the Actions menu can be performed on shared snapshots, though they may appear to be accessible. One such option is the Delete snapshot option. Let’s say that the owner account that created the snapshot, shared it with multiple accounts. If one of the accounts that have access to this snapshot mistakenly deletes the snapshot, then others will lose access to it and even may have an impact on the source account which created this snapshot. So, ideally, the consuming accounts should not be able to delete the snapshot shared by the owner account.

Redshift cluster shared snapshot deletion

And the same is the case here. Select the snapshot, click on the Actions menu and select Delete snapshot option. A message would pop-up as shown below, informing that only the original account that created the snapshot can delete the snapshot.

Redshift cluster shared snapshot deletion

Navigate to the cluster page in Account – 2, and you would find that though the snapshot is made available by sharing, there are no clusters that are automatically created to use the snapshot. That option is up to the users to restore the snapshot by creating a cluster.

Redshift clusters

To restore the snapshot, navigate back to the snapshots, select the shared snapshot and click on the Restore snapshot option. That would bring up a page as shown below. The settings would be pre-populated and would be identical to the settings of the cluster from which the snapshot was created.

Redshift cluster restore from snapshots

Once the cluster is restored from the snapshot, the data would become accessible in Account – 2. Once this cluster is created, there is no dependency of Account – 2 on the snapshot that was shared from the original account, and the sharing can be removed and even the snapshot can be deleted if it’s not required.

Redshift cluster restored from snapshot

To delete the snapshot in the original account i.e. Account – 1, log on to this account and navigate to the Snapshots section. Select the Actions menu and click on Delete Snapshot to delete the snapshot. You would find an error as shown below. The reason for this error is that as long as the snapshot is shared with other accounts, even the original account that created the snapshot cannot delete it.

Redshift cluster snapshot delete error

To remove the sharing, open the manual snapshot and click on the Edit button. Click on the Remove account button to remove all the accounts to which access has been provided. Once all accounts are removed, repeat the above step and the snapshot should get deleted.

Redshift cluster snapshot remove shared access

Conclusion

In this article, we learned how to configure AWS Redshift snapshots, configure it to provision access to other accounts, and used the shared snapshots to restore an Amazon Redshift cluster. We also learned the criteria that need to be satisfied to delete a shared snapshot and the type of access any consuming accounts can exercise on a shared snapshot.

Table of contents

Getting started with AWS Redshift
Access AWS Redshift from a locally installed IDE
How to connect AWS RDS SQL Server with AWS Glue
How to catalog AWS RDS SQL Server databases
Backing up AWS RDS SQL Server databases with AWS Backup
Load data from AWS S3 to AWS RDS SQL Server databases using AWS Glue
Load data into AWS Redshift from AWS S3
Managing snapshots in AWS Redshift clusters
Share AWS Redshift data across accounts
Export data from AWS Redshift to AWS S3
Restore tables in AWS Redshift clusters
Getting started with AWS RDS Aurora DB Clusters
Saving AWS Redshift costs with scheduled pause and resume actions

Rahul Mehta
AWS

About Rahul Mehta

Rahul Mehta is a Software Architect with Capgemini focusing on cloud-enabled solutions. He works on various cloud-based technologies like AWS, Azure, and others. He has worked internationally with Fortune 500 clients in various sectors and is a passionate author. View all posts by Rahul Mehta

253 Views