Service Implementation Team Runbooks and Ownership

This section is for operational teams responsible for executing documented runbooks for recurring tasks and issue resolution.

Runbooks describe step-by-step operational procedures that Service Implementation Team must follow for recurring tasks and common issues.

Ownership

Role Responsibility
Authoring and Updates Product Engineering / Implementation team, with inputs from Tech BA, ETL Developer, and Metadata Architect.
Execution Client Service Implementation Team Operations team.
Approval Implementation Manager / Project Manager, in alignment with client operations lead.

Runbooks should explicitly identify which steps are Service Implementation Team-only, and which steps require escalation before execution.

Minimum Runbooks Required

At minimum, the following runbooks must exist and be accessible to the Service Implementation Team:

Runbook 1. Daily/Batch Execution Runbook

How to verify all scheduled jobs for the Canonical stack have run:

  • Marketing Data Mart → LDZ ingestion
  • LDZ → RDV transformations
  • RDV / Mart → Customer 360 / Campaign 360 / Flowchart 360 / cdm_publish_db
  • ML Model write-back: cdm_ingest_db → Customer 360 enrichment pipeline
Expected run times and typical durations.
Where to view status dashboards.
Runbook 2. Failure Handling Runbook (Per Layer)
  • LDZ load failure (missing file, bad format, connectivity).
  • RDV load failure (constraint violation, key mismatch, DV pattern violation).
  • Interface Layer failure (schema mismatch, mapping metadata issue, transformation error).
  • 360-layer failure (aggregations, joins, pre-aggregated tables, materialized views) covering Customer 360, Campaign 360, and Flowchart 360.
  • ML Model ingestion / enrichment failure (cdm_ingest_db write-back or Customer 360 enrichment step).
Runbook 3. Rerun and Backfill Runbook
  • How to safely re-run a failed job for a specific date/batch.
  • How to handle partial loads.
  • Preconditions and post-checks for reruns.
Runbook 4. Release and Configuration Change Runbook
  • How to validate after a code/config deployment.
  • What basic smoke tests Service Implementation Team must execute.
Runbook 5. Basic Data Validation Runbook (Non-Business)
  • Row-count comparisons between layers where defined (e.g., LDZ vs RDV).
  • Key technical checks (duplicate keys, nulls in non-null technical columns) when explicitly documented.
  • Flowchart 360 validation: flowchart_id present, execution_count > 0, pre-aggregated metric columns populated.
Aggregate Layer Validation
  • Verify aggregate model refresh completed successfully.
  • Confirm aggregate refresh timestamp is updated.
  • Validate aggregate row counts against expected daily growth.
  • Verify rolling-window metrics (7d, 30d, 90d) are populated.
  • Escalate to Services Team if aggregate refresh fails or metrics appear stale.
Audience Resolution Validation
  • Verify Audience_map table contains active mappings.
  • Validate Campaign–Offer–Channel–Product bridge records are populated.
  • Confirm audience resolution process completed successfully.
  • Review unresolved audience counts where available.
  • Escalate to Services Team if audience resolution failures impact Customer 360, Campaign 360, or Flowchart 360 outputs.
Runbook 6. Oracle/SqlServer/Snowflake Setup DAG Execution Runbook (Applicable for CouldNative Setup)

Required for all Oracle/SQL Server/Snowflake schemas environment setups and rebuilds.

Three new Airflow DAGs must be run in sequence during initial Oracle/SQL Server/Snowflake CDM setup or any environment rebuild. Service Implementation Team must have a documented procedure for each step.
DAG Name Purpose When to Run
airflow_variable_sync Syncs all required Airflow variables for the Oracle environment. First, before any other DAG on a new setup.
ddl_execution_dag_multidb Automatically creates all Oracle/ schemas,SQL Server/snowflake tables and view. Once during initial Oracle/ schemas, SQL Server/snowflake CDM setup or rebuild.
etl_date_control_update_dag Updates the ETL date control table with correct business dates. Before every load run, Day 0 and each BAU run.
Important: These DAGs replace manual DDL script execution. Service Implementation Team must not execute DDL scripts manually when these DAGs are available.