Performance Sizing

Data Volume Overview


Audiences	SRC_CH (Contacts)	SRC_CH (Responses)
100,000	11 Million	6 Million

Airflow (for DAG Orchestration)


CPU	RAM	Disk
8 cores	32 GB	500 GB

Canonical DB (Oracle)


CPU	RAM	Disk	Oracle Node
16 cores	64 GB	500 GB	1

Canonical DB (SQL Server)


CPU	RAM	Disk	SQL Server Node
16 cores	72 GB	750 GB	1

Snowflake sizing is workload-driven and primarily governed by Virtual Warehouse configuration. Initial recommendation is to start with Medium warehouse for ETL and Small for interactive workloads, and scale based on concurrency and query performance.

Note: This is minimum sizing. Actual production environments require higher specifications depending on concurrency, ETL frequency, query complexity, number of DAGs, task concurrency, and data volume.

Tuning Recommendations

For optimal performance, database tuning should be carried out by an experienced Database Administrator (DBA). The following recommendations provide guidance for improving scalability, performance, and reliability.

Core Database Optimization

Cluster & High Availability: Add database nodes and enable Oracle RAC to improve load balancing, reduce contention, and increase throughput.; Ensure high availability through multi-node architecture.
Storage & Compute Scaling: Scale memory to 128 GB–256 GB RAM.; Expand storage beyond 1 TB using SSD/NVMe.; Increase CPU cores to support parallel processing workloads.
Partitioning Strategy: Implement table partitioning to reduce I/O and improve query performance on large datasets.; Use partition pruning to process data in manageable segments.
Optimizer & Statistics: Refresh optimizer statistics regularly to ensure efficient execution plans and consistent performance.

ETL & Data Processing Optimization

Use bulk loading mechanisms to improve ingestion performance.
Implement incremental data loads instead of full refreshes.
Schedule heavy ETL workloads during off-peak hours.
Optimize CSV generation by increasing chunk size.
Use parallel threads aligned with redo log capacity for faster execution.

Backup, Recovery & Monitoring

Perform daily RMAN backups and manage archive logs effectively.
Regularly test recovery procedures to ensure reliability.
Monitor system performance using AWR/ASH reports.
Set alerts for critical thresholds (CPU, memory, I/O, blocking).

Scalability & Future Readiness

Adopt a multi-node architecture for horizontal scalability.
Separate compute and storage layers where applicable.
Ensure schema design supports growth in Customer History (CH) and Response History (RH) datasets.

New Feature Optimization (Flowchart 360)

Optimize queries handling execution-level granular data.
Create indexes on:
- flowchart_id
- campaign_code
- execution timestamps
Pre-aggregate frequently used metrics (clicks, opens, responses) to reduce runtime computation.

ML Model Performance (STO & NBC)

Optimize feature extraction queries from Customer 360.
Use materialized views or precomputed feature tables.
Schedule feature generation during off-peak hours.
Ensure efficient write-back of ML outputs into Customer 360 for downstream use.

Multi-Database Tuning Considerations

Oracle

Continue leveraging:

Oracle RAC for clustering
Table partitioning
AWR/ASH for performance monitoring

Snowflake

Size Virtual Warehouses (X-Small → X-Large) based on workload.

Use auto-suspend (e.g., 60 seconds) to optimize cost.

Apply clustering keys on high-cardinality columns:

party_id,
                            campaign_code, load_date

Use dbt incremental models (MERGE strategy) for RDV and aggregates.

Create materialized views / dynamic tables for Customer 360 and Campaign 360.

Enable query result caching for repeated analytical queries.

Use dedicated warehouses for ETL vs analytical workloads.

Monitor via:

Query History
Information Schema
Resource Monitors

SQL Server

Enable:

In-Memory OLTP
Columnstore Indexes for analytical workloads

Partition tables on: load_date, hub_load_date

Configure:

Max server memory for buffer pool efficiency
TempDB with multiple data files (aligned to CPU cores)

Use:

Bulk insert with TABLOCK
SQL Server Agent / Airflow for orchestration

Maintain statistics with AUTO_UPDATE_STATISTICS

Use Query Store for plan stability

Implement Always On Availability Groups for HA and read scaling

Monitor using:

DMVs
Extended Events
Alerts for CPU, memory, and blocking