Redshift
Integrating Redshift with Synq
This guide explains how to securely connect Synq to Redshift.
We need this information so we collect metadata about your tables.
To be able to finish this guide, you’ll need the following:
→ Access to modify your Redshift configuration
⏱️ Estimated time to finish: 10 minutes.
Data we collect
For the automated data anomaly testing we collect the following:
- Number of rows in every table in the monitored dataset(s)
- Timestamp of the last change of data in all tables in the monitored dataset (see Freshness monitoring of tables for enabling this)
To provide out-of-the-box monitors for the volume of data Synq doesn’t require access to your actual data.
It needs access to query metadata from these tables:
Table Description svv_redshift_databases List of all the databases that a user has access to svv_all_schemas Summary of schemas svv_table_info Summary information for tables in the database svv_external_tables Details for external tables such as table and schema names
Step 4 goes through how to set permissions to allow access to querying raw data
Setup Redshift access
Create a dedicated Synq user with sufficient permissions
-
Open the Query Editor in Redshift and make sure that we got the right cluster and database selected
-
Run the following SQL to create a Synq user and group with the sufficient permissions where
<password>
is a secret you can set to be used by SynqCREATE GROUP synq; CREATE USER synq_user PASSWORD '<password>' IN GROUP synq; GRANT SELECT ON pg_catalog.svv_redshift_databases TO GROUP synq; GRANT SELECT ON pg_catalog.svv_all_schemas TO GROUP synq; GRANT SELECT ON pg_catalog.svv_all_tables TO GROUP synq; GRANT SELECT ON pg_catalog.svv_all_columns TO GROUP synq; GRANT SELECT ON pg_catalog.svv_table_info TO GROUP synq; GRANT SELECT ON pg_catalog.svv_external_tables TO GROUP synq;
-
Grant us permission to list tables and their columns in specific schemas. When used with views having
WITH NO SCHEMA BINDING
we needUSAGE
grant for both schemas, the one in which late binding view is defined and the one from which it is reading. This has to be repeated for every schema:GRANT usage ON schema <schema_name> TO GROUP synq;
To grant
USAGE
to all schemas one can use:DO $do$ DECLARE sch text; BEGIN FOR sch IN SELECT nspname FROM pg_namespace LOOP EXECUTE format($$ GRANT USAGE ON SCHEMA %I TO GROUP synq $$, sch); END LOOP; END; $do$;
-
Grant access to tables within a schema (optional)
If you want Synq to be able to run monitors on your raw data, you need to give usage and select permission. This is required for e.g., Custom SQL monitors
GRANT USAGE ON SCHEMA "schema" TO GROUP synq;
GRANT SELECT ON ALL TABLES IN SCHEMA "schema" TO GROUP synq;
ALTER DEFAULT PRIVILEGES IN SCHEMA "schema" GRANT SELECT ON TABLES TO GROUP synq;
Whitelist the Synq app
-
From Redshift, navigate to
Workgroup configuration
and click your default Workgroup -
Under Network and security click
Edit
and check the Turn on Publicly accessible checkmark. Then clickSave changes
-
Close the window to return to the Workgroup space and click your VPC security group
-
Under the Inbound rules tab click
Edit inbound rules
. Then clickAdd rule
and populate it with the following information- Type:
All Traffic
- Source:
Custom
with the following value in the text field:34.105.135.39
- Type:
-
Click
Save rules
Database, dataset and host name
-
Take a note of your database and dataset. These can be found in the Query Editor
-
Go to
Workgroup configuration
and select your default workgroup. Take a note of yourJDBC URL
Once you’ve run successfully completed these steps input the following data in Synq
Input data in the Synq UI
Integration name
For example Redshift
Database
The name of your database (e.g example_production_database
)
Host
The name of your host (e.g. cluser-name.us-east-1.redshift.amazonaws.com
)
To find the host, go to Workgroup configuration, and look at the Endpoint.
Copy everything until the amazonaws.com
and leave out the rest. In most cases, that will be :5439/dev
.
Paste it into the Host field in Synq.
Port
Port you specified earlier in the guide. Default port is 5439
User and password
User and password you specified earlier in the guide
Default dataset
Dataset you want Synq to monitor
In some cases, permission groups can take a while to propagate.
If you keep getting a Connect failed
error after double-checking your details, try again later.
Freshness monitoring of tables
When running on Redshift Serverless, we need access to query special SYS_* monitoring views. Follow the steps in the official Redshift Serverless documentation to configure IAM roles if needed https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-monitoring.html
Once you do the above, you must grant Synq user access to query these views.
GRANT ROLE sys:monitor TO "synq_user";