Performance Sizing

Data Volume Overview

Audiences SRC_CH (Contacts) SRC_CH (Responses)
100,000 11 Million 6 Million

Airflow (for DAG Orchestration)

CPU RAM Disk
8 cores 32 GB 500 GB

Canonical DB (Oracle)

CPU RAM Disk Oracle Node
16 cores 64 GB 500 GB 1

Canonical DB (SQL Server)

CPU RAM Disk SQL Server Node
16 cores 72 GB 750 GB 1

Snowflake sizing is workload-driven and primarily governed by Virtual Warehouse configuration. Initial recommendation is to start with Medium warehouse for ETL and Small for interactive workloads, and scale based on concurrency and query performance.

Note: This is minimum sizing. Actual production environments require higher specifications depending on concurrency, ETL frequency, query complexity, number of DAGs, task concurrency, and data volume.

Tuning Recommendations

For optimal performance, database tuning should be carried out by an experienced Database Administrator (DBA). The following recommendations provide guidance for improving scalability, performance, and reliability.

Core Database Optimization

Cluster & High Availability
Add database nodes and enable Oracle RAC to improve load balancing, reduce contention, and increase throughput.
Ensure high availability through multi-node architecture.
Storage & Compute Scaling
Scale memory to 128 GB–256 GB RAM.
Expand storage beyond 1 TB using SSD/NVMe.
Increase CPU cores to support parallel processing workloads.
Partitioning Strategy
Implement table partitioning to reduce I/O and improve query performance on large datasets.
Use partition pruning to process data in manageable segments.
Optimizer & Statistics
Refresh optimizer statistics regularly to ensure efficient execution plans and consistent performance.

ETL & Data Processing Optimization

  • Use bulk loading mechanisms to improve ingestion performance.
  • Implement incremental data loads instead of full refreshes.
  • Schedule heavy ETL workloads during off-peak hours.
  • Optimize CSV generation by increasing chunk size.
  • Use parallel threads aligned with redo log capacity for faster execution.

Backup, Recovery & Monitoring

  • Perform daily RMAN backups and manage archive logs effectively.
  • Regularly test recovery procedures to ensure reliability.
  • Monitor system performance using AWR/ASH reports.
  • Set alerts for critical thresholds (CPU, memory, I/O, blocking).

Scalability & Future Readiness

  • Adopt a multi-node architecture for horizontal scalability.
  • Separate compute and storage layers where applicable.
  • Ensure schema design supports growth in Customer History (CH) and Response History (RH) datasets.

New Feature Optimization (Flowchart 360)

  • Optimize queries handling execution-level granular data.
  • Create indexes on:
    • flowchart_id
    • campaign_code
    • execution timestamps
  • Pre-aggregate frequently used metrics (clicks, opens, responses) to reduce runtime computation.

ML Model Performance (STO & NBC)

  • Optimize feature extraction queries from Customer 360.
  • Use materialized views or precomputed feature tables.
  • Schedule feature generation during off-peak hours.
  • Ensure efficient write-back of ML outputs into Customer 360 for downstream use.

Multi-Database Tuning Considerations

Oracle
Continue leveraging:
  • Oracle RAC for clustering
  • Table partitioning
  • AWR/ASH for performance monitoring
Snowflake
Size Virtual Warehouses (X-Small → X-Large) based on workload.
Use auto-suspend (e.g., 60 seconds) to optimize cost.
Apply clustering keys on high-cardinality columns: party_id, campaign_code, load_date
Use dbt incremental models (MERGE strategy) for RDV and aggregates.
Create materialized views / dynamic tables for Customer 360 and Campaign 360.
Enable query result caching for repeated analytical queries.
Use dedicated warehouses for ETL vs analytical workloads.
Monitor via:
  • Query History
  • Information Schema
  • Resource Monitors
SQL Server
Enable:
  • In-Memory OLTP
  • Columnstore Indexes for analytical workloads
Partition tables on: load_date, hub_load_date
Configure:
  • Max server memory for buffer pool efficiency
  • TempDB with multiple data files (aligned to CPU cores)
Use:
  • Bulk insert with TABLOCK
  • SQL Server Agent / Airflow for orchestration
Maintain statistics with AUTO_UPDATE_STATISTICS
Use Query Store for plan stability
Implement Always On Availability Groups for HA and read scaling
Monitor using:
  • DMVs
  • Extended Events
  • Alerts for CPU, memory, and blocking