DAG Reference

Queryvine DAGs (directed acyclic graphs) let you compose multi-step pipelines with explicit data dependencies, parallel execution groups, and conditional branching logic.

Defining tasks

A DAG is defined in YAML as a list of tasks. Each task has a connector, table or topic, and a depends_on list that controls execution order.

dag-definition.yaml

dag_id: nightly-revenue-pipeline
schedule: "0 2 * * *"
tasks:
  - id: ingest_orders
    connector: postgres
    table: orders
    destination: snowflake.raw.orders

  - id: ingest_line_items
    connector: postgres
    table: line_items
    destination: snowflake.raw.line_items

  - id: transform_revenue
    connector: dbt_cloud
    job_id: "dbt_job_8810"
    depends_on: [ingest_orders, ingest_line_items]

Dependencies

Tasks in the same DAG that don't list each other in depends_on run in parallel. Queryvine automatically detects independent branches and schedules them concurrently.

In the example above, ingest_orders and ingest_line_items run in parallel. transform_revenue only starts when both upstream tasks complete successfully.

Conditional branches

Use run_if to define conditions based on upstream task outcomes or drift event results:

dag-definition.yaml (conditional branch)

  - id: notify_on_drift
    connector: webhook
    url: "https://hooks.slack.com/services/T00/B00/xxx"
    depends_on: [ingest_orders]
    run_if:
      condition: drift_detected
      severity: [warning, critical]

Backfill strategies

When a pipeline is paused due to a critical drift event and then resumed, Queryvine can replay the missed windows. Set the backfill strategy on the DAG:

backfill config

backfill:
  strategy: incremental   # or: full_reload, none
  max_lookback_hours: 72
  batch_size_rows: 100000
  parallelism: 4          # concurrent batches

incremental — replay only the rows created/updated during the paused window using the source's updated_at column.
full_reload — truncate the destination table and re-ingest the entire source. Use for small tables or when schema changes make incremental unsafe.
none — resume from the current state without backfilling the gap.