DAG Reference
Queryvine DAGs (directed acyclic graphs) let you compose multi-step pipelines with explicit data dependencies, parallel execution groups, and conditional branching logic.
Defining tasks
A DAG is defined in YAML as a list of tasks. Each task has a connector, table or topic, and a depends_on list that controls execution order.
dag_id: nightly-revenue-pipeline
schedule: "0 2 * * *"
tasks:
- id: ingest_orders
connector: postgres
table: orders
destination: snowflake.raw.orders
- id: ingest_line_items
connector: postgres
table: line_items
destination: snowflake.raw.line_items
- id: transform_revenue
connector: dbt_cloud
job_id: "dbt_job_8810"
depends_on: [ingest_orders, ingest_line_items]
Dependencies
Tasks in the same DAG that don't list each other in depends_on run in parallel. Queryvine automatically detects independent branches and schedules them concurrently.
In the example above, ingest_orders and ingest_line_items run in parallel. transform_revenue only starts when both upstream tasks complete successfully.
Conditional branches
Use run_if to define conditions based on upstream task outcomes or drift event results:
- id: notify_on_drift
connector: webhook
url: "https://hooks.slack.com/services/T00/B00/xxx"
depends_on: [ingest_orders]
run_if:
condition: drift_detected
severity: [warning, critical]
Backfill strategies
When a pipeline is paused due to a critical drift event and then resumed, Queryvine can replay the missed windows. Set the backfill strategy on the DAG:
backfill:
strategy: incremental # or: full_reload, none
max_lookback_hours: 72
batch_size_rows: 100000
parallelism: 4 # concurrent batches
- incremental — replay only the rows created/updated during the paused window using the source's
updated_atcolumn. - full_reload — truncate the destination table and re-ingest the entire source. Use for small tables or when schema changes make incremental unsafe.
- none — resume from the current state without backfilling the gap.