Skip to main content
This guide will show you how to securely connect your Airflow setup to Coalesce Quality.This integration extracts metadata about Airflow. By default, all tasks and all DAGs are reported.To be able to finish this guide, you’ll need the following:
→ Access to modify your Airflow configuration code
⏱️ Estimated time to finish: 10 minutes.

Setup with OpenLineage

  1. In the app, create a new Airflow integration. Make sure that you select OpenLineage as the processing mode.
  2. Once the integration has been created, use the generated token to configure Airflow to securely forward your events to the ingestion service.
Airflow uses a transport configuration option to define how to forward these events.
  • URL: https://developer.synq.io
  • Endpoint: /api/ingest/openlineage/v1
  • API Key: Generated by the Airflow integration.
With that information, you can follow the Airflow OpenLineage documentation to configure your Airflow instance.

Example OpenLineage Transport Value

{
  "type": "http",
  "url": "https://developer.synq.io",
  "endpoint": "/api/ingest/openlineage/v1",
  "auth": {
    "type": "api_key",
    "api_key": "st-***" // redacted
  }
}

Setup with DataHub

  1. Install the required dependencies in your Airflow
pip install 'acryl-datahub-airflow-plugin>=1.1.0.4' apache-airflow-providers-openlineage
For detailed installation instructions and compatibility information, see the DataHub Airflow documentation.
  1. Setup the REST hook
  • conn-host: https://datahubapi.synq.io/datahub/v1/
  • conn-password: Token you obtain when you click ‘Create’ on this page
airflow connections add  --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '<host>' --conn-password '<token>'

Airflow 2.7+ with OpenLineage Provider

If you’re using Airflow 2.7+, the native Airflow OpenLineage provider will improve the quality of lineage and metadata information obtained from your Airflow setup. The OpenLineage provider package is already included in the installation above since the DataHub plugin requires it. For AWS MWAA, add both acryl-datahub-airflow-plugin and apache-airflow-providers-openlineage to your requirements.txt file.

Log Forwarding

Log forwarding is required to include full task failure logs in alerts. Multiple methods are supported:

AWS MWAA CloudWatch Logs

For AWS Managed Workflows for Apache Airflow (MWAA), you can forward logs from CloudWatch using the synq-aws-cloudwatch Lambda function:
  1. Deploy the Lambda function: Use the synq-aws-cloudwatch repository to deploy a Lambda function that forwards CloudWatch logs to Coalesce Quality.
  2. Configure log forwarding: The Lambda automatically forwards Airflow logs from CloudWatch when properly configured with your credentials.
  3. Set up log group subscription: Configure CloudWatch to trigger the Lambda when new Airflow logs are available.

API Endpoint

You can also send logs directly using the API endpoint. This endpoint accepts Airflow log data for processing and analysis.

Remote Logging (S3/GCS)

You can also upload logs after files are created on S3 or GCS using Airflow’s Remote Logging feature. For assistance with this setup, please reach out to our support team.

Read more

For more information about the plugins used to collect task execution and lineage data: