Data Quality and PII Guidance

Every mapping row in the MID should have at least one data quality rule. Every source column containing personally identifiable information must be flagged in the PII column.

Data Quality Rules

DQ Rule type Applies to Example rule
NOT NULL All mandatory fields (PK components, rec_src) partyid IS NOT NULL
Uniqueness Natural key combinations (partyid, rec_src) must be unique in ldz_party_demographic
Domain / Code check Any field with a closed code set gender IN ('MALE','FEMALE','UNKNOWN')
Date range Date of birth, open date, transaction date dob BETWEEN 1900-01-01 AND SYSDATE
Date sequence Start and end date pairs close_dt >= open_dt (when both are not null)
Format PAN, email, phone, currency codes PAN matches regex [A-Z]{5}[0-9]{4}[A-Z]{1}
Referential FK relationships between entities acct_num in account_party must exist in account_dtl
Non-negative Amounts, balances, rates txn_amt >= 0
Length Fixed-length fields AADHAAR length = 12
Completeness Mandatory fields for active records open_dt IS NOT NULL WHERE acct_sta = 'ACTIVE'

PII Classification

Every source column that contains personally identifiable information must be flagged in the PII column of the MID. The CDM metadata layer uses this flag to apply masking, encryption, or access control at the ETL stage.

PII Category Examples Required treatment
Direct identifier Full name, email, phone, national ID, PAN, passport Mark as PII=Yes. Apply tokenisation or encryption at rest.
Quasi-identifier Date of birth, gender, postcode (alone these are low risk; combined they re-identify) Mark as PII=Quasi. Apply data governance controls.
Sensitive category Biometric data (Aadhaar), health, financial account details Mark as PII=Sensitive. Apply masking + strict access control.
Non-PII Account type, transaction date, product code, amounts No PII flag required. Standard controls apply.

Extended MID Metadata Requirements

The standard MID structure covers core mapping fields. The following additional metadata columns must also be completed for every mapped attribute to support data governance, privacy controls, access management, and analytics exposure decisions.

MID Metadata Field Guidance Mandatory?
PII Category Classify the field using the taxonomy from Section 12.2: Direct Identifier, Quasi-identifier, Sensitive Category, or Non-PII. This is distinct from the binary PII=Yes/No flag and provides the specific category required by regulatory and masking frameworks. Yes - all fields
Tokenisation Required (Y/N) Indicate Y or N. Set to Y for any Direct Identifier or Sensitive Category field that requires a reversible pseudonym so that the original value can be retrieved under controlled access. Set to N for fields that are masked, encrypted-at-rest only, or carry no PII. Yes - PII fields
Tokenisation Method Required only when Tokenisation Required = Y. Specify the method: FORMAT_PRESERVING (FPE), HASH_SHA256, AES_ENCRYPT, or VAULT_TOKEN. The choice must align with the project’s data security architecture. Confirm the method with the security architect before populating. Conditional (if Y above)
Allowed in 360 View (Y/N) Indicate Y or N. Controls whether this attribute is surfaced in Customer 360, Campaign 360, or Flowchart 360 views. Set to N for attributes that must remain in the LDZ/RDV layers only (e.g. raw operational codes, interim staging fields, or attributes pending business sign-off). Default is Y for all fully validated CDM attributes. Yes - all fields
Allowed for MaxAI / Analytics Exposure (Y/N) Indicate Y or N. Controls whether this attribute may be passed to MaxAI agents (Segmentation, Offer, Content) or exposed in analytics dashboards and reports. Set to N for attributes whose sensitivity level, regulatory classification, or business decision prohibits AI model consumption or broad analytical access. Must be confirmed by the business data steward and, for PII fields, the Data Protection Officer. Yes - all fields
Aggregation-Only Flag (Y/N) Indicate Y or N. Set to Y when the attribute may only be consumed as part of an aggregated metric (e.g. total_transaction_count, average_balance) and must never be returned at individual record level in queries, reports, or AI feature vectors. Typically applied to sensitive financial or health-adjacent fields where regulatory or data minimisation requirements prohibit row-level exposure. Yes - all fields
Lineage Mapping Document the end-to-end column-level lineage path in the format: SOURCE_SYSTEM.SOURCE_TABLE.SOURCE_COLUMN → LDZ_TABLE.LDZ_COLUMN → RDV_TABLE.RDV_COLUMN → BDV_TABLE.BDV_COLUMN. While the MID is the system of record for lineage, each row must carry this explicit path to enable automated lineage catalogue population and impact analysis. At minimum, the source-to-LDZ leg must be populated before sign-off; the remaining legs should be completed by the ETL lead during engineering. Yes (source-to-LDZ minimum)
Owner / Steward (Implementation Side) Named individual from the implementation team who is accountable for the correctness of this mapped attribute. This is the person who must be contacted if a data issue, regulatory query, or change request arises post go-live. Populate with full name and role (e.g. “Priya Mehta — CDM Data Architect”). Distinct from the client-side business steward who signs off semantics in the Review Gate. Yes - all fields
Note: Fields for Source System, Source Column, CDM Target Attribute, Business Meaning, and DQ Rule Reference are already required in the core MID structure and are not repeated here. The table above documents only the fields that were previously missing from the standard MID template.