PII Protection, Tokenization & AI Data Governance

Executive Summary

This section establishes the foundational governance framework for PII protection, tokenization, and controlled AI data exposure within the Canonical Data Model (CDM) and MaxAI ecosystem.

The primary objective of this release is to ensure that sensitive and personally identifiable information (PII) is governed before it enters governed analytical and AI-serving layers. The governance posture defined in this section is intentionally architecture-driven and focuses on minimizing exposure risk at ingestion, storage, query, and AI interaction layers.

The framework is based on the following principles:

Raw PII must not enter AI processing layers.
PII must be tokenized prior to ingestion into CDM.
Only analytics-required data should enter governed analytical layers.
MaxAI must operate exclusively on governed, curated, read-only data surfaces.
Governance must be enforced structurally through architecture, not probabilistically through AI filtering.

This release specifically focuses on PII exposure prevention, tokenization enforcement, controlled data onboarding into CDM, and governed exposure to MaxAI.

Governance Overview

Source Systems may contain raw PII and sensitive customer information. All approved PII must be identified and tokenized before data enters CDM.

CDM acts as the governed analytical substrate and should not become an unrestricted enterprise storage layer. Only analytics-relevant, governed, approved data is permitted into CDM.

MaxAI operates exclusively on governed curated views and never on raw source tables or token vaults.

Governance is enforced across:

Ingestion & Tokenization: Ensures all inbound data is classified, protected, and tokenized before entering CDM, preventing raw PII exposure within governed analytical and AI-serving layers.
Data Model Governance: Establishes governance controls across model lifecycle management, including change management, versioning, interface mappings, metadata governance, extension management, data quality controls, lineage traceability, PII classification, and controlled release/deployment processes.
RBAC & View-Level Access Control: Restricts access through governed roles and curated views, enforcing masking, row-level security, read-only access, and prevention of unrestricted schema or base-table access.
AI Interaction Layers: Ensures AI services operate only on approved governed datasets with controls around prompt handling, session isolation, output protection, audit logging, and query execution boundaries.
Metadata Governance: Ensures mappings, lineage, DQ rules, and AI exposure controls are centrally governed through managed metadata. For more information refer to Metadata Model: The Core Integration Layer.

Core Governance Principles

Zero Raw PII in AI: No raw PII may reach MaxAI under any condition.
Tokenization First: All approved PII attributes must be tokenized prior to CDM ingestion.
Data Minimization: Only analytics-required data should enter governed analytical and AI-serving layers.
Read-Only AI: MaxAI cannot insert, update, delete, or mutate underlying datasets.
Least Privilege Access: Access is restricted to curated governed views only.
Client Ownership: Client owns classification, exposure approval, and risk acceptance decisions.
Architecture-Led Security: Governance is enforced structurally through architecture, not AI intelligence alone.

PII Exposure Risk Landscape

Historically, Martech ecosystems frequently operated through extracts, spreadsheets, local flat files, and manually managed datasets due to operational realities and slow centralized data access.

AI fundamentally changes this governance model because AI systems:

Operate at machine speed
Accumulate conversational context
Execute multi-step analytical workflows
Generate derived insights
Operate across larger contextual surfaces than human users

Key exposure vectors include:

Improper upstream ingestion of raw PII
Misconfigured governed views
Prompt-level PII injection
Aggregation-based re-identification
Cross-system joins
Session context accumulation
Export and download surfaces
AI-generated analytical outputs

PII Discovery & Classification

PII discovery is a mandatory governance-controlled exercise conducted during onboarding and due diligence.

The platform does not automatically classify or infer PII. Classification ownership remains with client governance and compliance stakeholders.

The Professional Services (PS) team facilitates:

Initial discovery workshops
Candidate attribute identification
Governance review preparation

Every onboarded attribute must be classified into categories such as:

Direct Identifier
Linkable
Sensitive
Non-PII

The classification becomes the foundation for tokenization, exposure, and AI governance decisions.

Tokenization & Obfuscation Policy

All approved PII attributes must be tokenized before ingestion into CDM.

Supported tokenization approaches include:

Vault-based tokenization
Salted irreversible hashing
Controlled masking
Format-preserving tokenization where required

The token vault must:

Exist outside CDM
Remain inaccessible to MaxAI
Be governed by client-controlled security infrastructure

Tokens must:

Be non-reversible within CDM and MaxAI
Avoid semantic encoding
Remain consistent where analytical joinability is required

Attribute-Level Governance Model

Every attribute entering CDM must carry governance metadata including:

PII classification
Tokenization requirement
Tokenization type
LLM shareability
Exposure classification
Regulatory sensitivity indicators

This metadata becomes the foundational governance contract for downstream AI exposure control.

Example exposure models:

RAW
AGGREGATED_ONLY
SUPPRESSED

Sensitive and non-marketing-required data should not enter CDM.

Data Access & Exposure Controls

MaxAI operates exclusively on curated governed database views.

The following are prohibited:

Direct base table access
Token vault access
Schema browsing
Unrestricted joins
Write operations

Governance controls include:

Row-level security
Column-level masking
RBAC
Read-only database roles
View-level governance

Prompt-Level PII Exposure Governance

Prompt-level risk represents one of the most important new governance areas introduced by AI-assisted systems.

Users may intentionally or unintentionally submit PII into prompts. Such scenarios bypass upstream tokenization controls.

Current governance focus areas include:

Prompt audit logging
Session traceability
Architectural containment
Read-only execution boundaries
User awareness and governance discipline

This release doesn't position AI moderation as a deterministic security control.

Session and Memory Governance

AI systems introduce a new governance category referred to as data-in-conversation.

Traditional governance focused primarily on:

Data-at-rest
Data-in-transit

AI systems additionally require governance of:

Session context
Prompt histories
Derived analytical context
Multi-turn accumulation

For this release:

Session isolation is mandatory
Context persistence is minimized
No unrestricted shared memory exists
Session and prompt traceability are retained for audit purposes

AI Output Governance

AI-generated outputs can create indirect PII exposure even when upstream tokenization exists.

Governance controls therefore include:

Aggregation thresholds
Suppression rules
Derived PII monitoring
Export governance
Query narrowing protections
Output redaction

Minimum aggregation rules are enforced to reduce re-identification risk from small cohorts or unique analytical patterns.

Runtime, Capacity & Operational Governance

AI governance is also an operational governance problem.

Unbounded AI workloads can:

Increase inference opportunities
Create retry-loop vulnerabilities
Bypass audit visibility
Degrade governance enforcement

Operational governance therefore includes:

Query complexity limits
Session concurrency controls
Capacity-aware throttling
Agent execution constraints
Audit preservation priorities

Integration & Ecosystem Governance

Any system integrating with CDM or MaxAI becomes part of the governance perimeter.

This includes:

BI tools
APIs
Streaming integrations
Third-party connectors
External analytical platforms

All integrations must:

Consume governed views only
Respect tokenization controls
Operate within approved access scopes
Undergo governance review before onboarding

Compliance & Accountability

Client Responsibilities:

PII classification
Exposure approval
Regulatory interpretation
Token vault governance
Operational monitoring

Platform Responsibilities:

Enforcement architecture
View-based governance
Read-only AI execution
Session isolation
Audit traceability

Joint Responsibilities:

Governance reviews
Integration validation
Incident response coordination

Strategic Positioning

This governance framework is intentionally architecture-led and governance-first.

The objective is not to position AI as inherently secure.

The objective is to build governed AI operating boundaries.

This release focuses specifically on:

PII protection
Tokenization enforcement
Controlled analytical exposure
Governed onboarding into CDM
Read-only AI execution

Broader enterprise AI governance capabilities will evolve incrementally in future roadmap phases.

Final Governance Position

CDM should be treated as a governed analytical substrate — not an unrestricted enterprise data lake.

MaxAI should be treated as a governed intelligence layer — not an autonomous unrestricted AI system.

Security and compliance are enforced through:

Architecture
Governance
Controlled exposure
Tokenization
Operational discipline — not through AI intelligence alone.

CDM Data Acceptance Policy

CDM is a governed analytical substrate — not a general-purpose enterprise data store. Only data that serves a direct Martech, Customer 360, Campaign 360, Flowchart 360, segmentation, personalisation, or AI analytics purpose is permitted to enter CDM. This policy defines what is accepted and what is explicitly rejected.

CDM ACCEPTS:

Martech-relevant customer profile attributes required for segmentation and personalisation
Consent and preference data (with source, effective date, channel-level precedence, and status)
Campaign metadata (goals, budget, dates, state, offer and channel relationships)
Contact history and response history relevant to campaign analytics
Offer, channel, and product relationship data
Analytical aggregates required for Customer 360, Campaign 360, and Flowchart 360
Tokenized identifiers required for analytical joins

CDM DOES NOT ACCEPT:

Raw email addresses, phone numbers, postal addresses, national IDs, or KYC documents
Free-text notes or comments containing uncontrolled PII
Unrestricted source system dumps without a documented Martech use case
Token vault data or raw cryptographic keys
Non-marketing operational data with no Customer 360, Campaign 360 or Flowchart 360 use case
Sensitive data not required for segmentation, personalisation, reporting, or analytics

Note: "Martech-relevant" means data directly needed for audience selection, campaign execution, personalisation, analytics, or AI-driven decisioning within the MaxAI/Unica ecosystem. Any attribute that cannot be justified under these purposes must not enter CDM.

Attribute-Level Metadata Requirements for Onboarding

Every attribute onboarded into CDM must carry the following governance metadata. This metadata forms the implementation's PII matrix and feeds directly into the interface document. The fields below are mandatory — partial population blocks onboarding.

Source system name
Source column name
CDM target attribute
Business meaning / definition
PII category (Direct Identifier / Linkable / Sensitive / Non-PII)
Tokenization required: Y/N
Tokenization method (vault-based / salted hash / format-preserving / masking)
Allowed in 360 view: Y/N
Allowed for MaxAI / analytics exposure: Y/N
Aggregation-only flag: Y/N (attribute exposed only as aggregate, never row-level)
DQ rule reference
Lineage mapping (LDZ column → RDV table/column → 360 view)
Owner / data steward from implementation side

This metadata is captured in the PII Classification Matrix template.

Audience Resolution Governance Guardrails

The Audience Resolution Layer resolves campaign participation across customer, account, and device levels into a unified customer identifier. Because this layer deals with customer resolution and cross-level mapping, it creates re-identification risk if not governed correctly.

Audience identifiers must be tokenized before entering CDM — no raw customer, account, or device IDs in the resolution layer
Customer / account / device mappings must not expose raw identifiers in any downstream view
Resolution logic must be metadata-driven (via Audience_map table) and fully auditable
Unresolved audience records must be quarantined or flagged — not silently dropped or passed through
Mapping confidence and resolution status must be captured as metadata per record
Downstream 360 views must consume resolved canonical customer keys only — not raw source audience identifiers

Consent Model Coverage in CDM

The following consent attributes are in scope for the current release. Items marked as backlog are targeted for Release 26.2.

Consent source — in scope
Consent effective date — in scope
Consent status (current, expired) — in scope via expiry date
Consent use in segmentation and activation — in scope
Consent lineage from source to 360 — in scope via metadata lineage
Channel-level consent precedence — backlog (26.2)
Consent revocation handling — backlog (26.2); processing statement dependency noted
Missing consent default behaviour — backlog (26.2)
Stale consent handling — backlog (26.2)