Anomaly monitors can help you catch important issues you may otherwise not have caught and are a core component of building reliability in business-critical data products.
Monitor setup considerations
It’s essential to be mindful of balancing coverage and signal-to-noise ratio. Maximal monitoring across the stack may be appealing but is not always the best option as it can cause alert fatigue and wear out your data team, causing important issues to get overlooked. The right monitor placement strategy can help you with this.
While there’s no one-fit-all monitoring strategy, here are our recommendations for how to approach setting up monitors. We’re always up for a chat if you want advice on setting up monitors for your specific situation.
1. Monitor data flowing to the warehouse
Automated freshness monitors complement your, e.g., dbt source freshness checks if you don’t know the exact threshold or failed to anticipate a freshness check was needed.
- As a rule of thumb, add freshness monitors to all your sources where you expect data to flow regularly. Use, e.g., dbt source freshness checks instead when you have explicit expectations of when data should be refreshed and want to encode it as an SLA that is agreed with the business.
- To avoid a freshness issue in an upstream table creating data freshness issues in dozens of tables downstream, only place freshness checks at the most upstream tables to get one cohesive alert for a comprehensive analysis rather than disjointed alerts.
- Combine freshness and volume monitors at sources to detect when data is flowing but at a reduced rate.
Setting up source monitors
2. Deploy volume monitors to detect abnormal table shrink or growth
Placing volume monitors on the critical path (tables upstream of your most important data assets) helps identify table shrink or growth that may otherwise be hard to detect. For example, you may double-count an aggregate metric with a one-to-many relationship between the primary and joined tables.
We recommend two different approaches depending on your risk tolerance and the number of data assets.
Start by placing volume monitors on important data assets—typically, these will be downstream tables fed directly into other tools such as ML models or BI reports. By placing monitors at the most downstream layer, you both catch issues that originate from, e.g., a drop in volume following a partial outage in an upstream engineering system and issues introduced through data transformations such as a faulty join creating duplicates.
Importantfilter, select all assets you’ve marked as important.
Place monitors on tables you’ve marked as important + all upstream dependencies—monitoring all tables on the critical path has a few added benefits. 1) you can more quickly identify the most upstream issue to get to the root cause of the problem, and 2) you’ll catch issues that may otherwise be missed, such as fan-outs where an aggregate metric in a downstream table is created but from duplicate rows introduced in a join.
Some downstream tables can have hundreds of upstream dependencies. In these cases, we recommend being more selective about where you place your monitors upstream
Query: upstream(important)to automatically place monitors on all tables upstream of important assets
Setting up table monitors
3. Extensively monitor downstream business-critical data assets
Even when you have a well-tested data pipeline with freshness checks, data quality tests, and volume monitoring, you may still be blind to more nuanced issues. We recommend selectively deploying custom monitors depending on your specific use case. For example
- Detect anomalies in a metric value—e.g., detect a spike or drop in your conversion rate metric with a
custom SQL monitor
- Detect a drop or spike for a specific segment—e.g., detect if the number of customers per geography spikes or drops with a
custom SQL monitor with segmentation
- Detect statistical issues in a field—e.g., detect if the % of null or empty values increases for a field used in an ML model with a
field stats monitor
Custom monitor detecting issues in average sales grouped by geography
Setting up custom monitors
We’ve written an article with in-depth recommendations if you’re using Synq alongside dbt: Anomaly monitors and dbt tests to ensure the quality of business-critical pipelines