PII Protection, Tokenization & AI Data Governance

Executive Summary

This section establishes the foundational governance framework for PII protection, tokenization, and controlled AI data exposure within the Canonical Data Model (CDM) and MaxAI ecosystem.

The primary objective of this release is to ensure that sensitive and personally identifiable information (PII) is governed before it enters governed analytical and AI-serving layers. The governance posture defined in this section is intentionally architecture-driven and focuses on minimizing exposure risk at ingestion, storage, query, and AI interaction layers.

The framework is based on the following principles:
  • Raw PII must not enter AI processing layers.
  • PII must be tokenized prior to ingestion into CDM.
  • Only analytics-required data should enter governed analytical layers.
  • MaxAI must operate exclusively on governed, curated, read-only data surfaces.
  • Governance must be enforced structurally through architecture, not probabilistically through AI filtering.

This release specifically focuses on PII exposure prevention, tokenization enforcement, controlled data onboarding into CDM, and governed exposure to MaxAI.

Governance Overview

Source Systems may contain raw PII and sensitive customer information. All approved PII must be identified and tokenized before data enters CDM.

CDM acts as the governed analytical substrate and should not become an unrestricted enterprise storage layer. Only analytics-relevant, governed, approved data is permitted into CDM.

MaxAI operates exclusively on governed curated views and never on raw source tables or token vaults.

Governance is enforced across:
  • Ingestion & Tokenization: Ensures all inbound data is classified, protected, and tokenized before entering CDM, preventing raw PII exposure within governed analytical and AI-serving layers.
  • Data Model Governance: Establishes governance controls across model lifecycle management, including change management, versioning, interface mappings, metadata governance, extension management, data quality controls, lineage traceability, PII classification, and controlled release/deployment processes.
  • RBAC & View-Level Access Control: Restricts access through governed roles and curated views, enforcing masking, row-level security, read-only access, and prevention of unrestricted schema or base-table access.
  • AI Interaction Layers: Ensures AI services operate only on approved governed datasets with controls around prompt handling, session isolation, output protection, audit logging, and query execution boundaries.
  • Metadata Governance: Ensures mappings, lineage, DQ rules, and AI exposure controls are centrally governed through managed metadata. For more information refer to Metadata Model: The Core Integration Layer.

Core Governance Principles

  • Zero Raw PII in AI: No raw PII may reach MaxAI under any condition.
  • Tokenization First: All approved PII attributes must be tokenized prior to CDM ingestion.
  • Data Minimization: Only analytics-required data should enter governed analytical and AI-serving layers.
  • Read-Only AI: MaxAI cannot insert, update, delete, or mutate underlying datasets.
  • Least Privilege Access: Access is restricted to curated governed views only.
  • Client Ownership: Client owns classification, exposure approval, and risk acceptance decisions.
  • Architecture-Led Security: Governance is enforced structurally through architecture, not AI intelligence alone.

PII Exposure Risk Landscape

Historically, Martech ecosystems frequently operated through extracts, spreadsheets, local flat files, and manually managed datasets due to operational realities and slow centralized data access.

AI fundamentally changes this governance model because AI systems:
  • Operate at machine speed
  • Accumulate conversational context
  • Execute multi-step analytical workflows
  • Generate derived insights
  • Operate across larger contextual surfaces than human users
Key exposure vectors include:
  • Improper upstream ingestion of raw PII
  • Misconfigured governed views
  • Prompt-level PII injection
  • Aggregation-based re-identification
  • Cross-system joins
  • Session context accumulation
  • Export and download surfaces
  • AI-generated analytical outputs

PII Discovery & Classification

PII discovery is a mandatory governance-controlled exercise conducted during onboarding and due diligence.

The platform does not automatically classify or infer PII. Classification ownership remains with client governance and compliance stakeholders.

The Professional Services (PS) team facilitates:
  • Initial discovery workshops
  • Candidate attribute identification
  • Governance review preparation
Every onboarded attribute must be classified into categories such as:
  • Direct Identifier
  • Linkable
  • Sensitive
  • Non-PII

The classification becomes the foundation for tokenization, exposure, and AI governance decisions.

Tokenization & Obfuscation Policy

All approved PII attributes must be tokenized before ingestion into CDM.

Supported tokenization approaches include:
  • Vault-based tokenization
  • Salted irreversible hashing
  • Controlled masking
  • Format-preserving tokenization where required
The token vault must:
  • Exist outside CDM
  • Remain inaccessible to MaxAI
  • Be governed by client-controlled security infrastructure
Tokens must:
  • Be non-reversible within CDM and MaxAI
  • Avoid semantic encoding
  • Remain consistent where analytical joinability is required

Attribute-Level Governance Model

Every attribute entering CDM must carry governance metadata including:
  • PII classification
  • Tokenization requirement
  • Tokenization type
  • LLM shareability
  • Exposure classification
  • Regulatory sensitivity indicators

This metadata becomes the foundational governance contract for downstream AI exposure control.

Example exposure models:

  • RAW
  • AGGREGATED_ONLY
  • SUPPRESSED

Sensitive and non-marketing-required data should not enter CDM.

Data Access & Exposure Controls

MaxAI operates exclusively on curated governed database views.

The following are prohibited:
  • Direct base table access
  • Token vault access
  • Schema browsing
  • Unrestricted joins
  • Write operations
Governance controls include:
  • Row-level security
  • Column-level masking
  • RBAC
  • Read-only database roles
  • View-level governance

Prompt-Level PII Exposure Governance

Prompt-level risk represents one of the most important new governance areas introduced by AI-assisted systems.

Users may intentionally or unintentionally submit PII into prompts. Such scenarios bypass upstream tokenization controls.

Current governance focus areas include:
  • Prompt audit logging
  • Session traceability
  • Architectural containment
  • Read-only execution boundaries
  • User awareness and governance discipline

This release doesn't position AI moderation as a deterministic security control.

Session and Memory Governance

AI systems introduce a new governance category referred to as data-in-conversation.

Traditional governance focused primarily on:
  • Data-at-rest
  • Data-in-transit

AI systems additionally require governance of:

  • Session context
  • Prompt histories
  • Derived analytical context
  • Multi-turn accumulation

For this release:

  • Session isolation is mandatory
  • Context persistence is minimized
  • No unrestricted shared memory exists
  • Session and prompt traceability are retained for audit purposes

AI Output Governance

AI-generated outputs can create indirect PII exposure even when upstream tokenization exists.

Governance controls therefore include:

  • Aggregation thresholds
  • Suppression rules
  • Derived PII monitoring
  • Export governance
  • Query narrowing protections
  • Output redaction

Minimum aggregation rules are enforced to reduce re-identification risk from small cohorts or unique analytical patterns.

Runtime, Capacity & Operational Governance

AI governance is also an operational governance problem.

Unbounded AI workloads can:

  • Increase inference opportunities
  • Create retry-loop vulnerabilities
  • Bypass audit visibility
  • Degrade governance enforcement
Operational governance therefore includes:
  • Query complexity limits
  • Session concurrency controls
  • Capacity-aware throttling
  • Agent execution constraints
  • Audit preservation priorities

Integration & Ecosystem Governance

Any system integrating with CDM or MaxAI becomes part of the governance perimeter.

This includes:
  • BI tools
  • APIs
  • Streaming integrations
  • Third-party connectors
  • External analytical platforms
All integrations must:
  • Consume governed views only
  • Respect tokenization controls
  • Operate within approved access scopes
  • Undergo governance review before onboarding

Compliance & Accountability

Client Responsibilities:
  • PII classification
  • Exposure approval
  • Regulatory interpretation
  • Token vault governance
  • Operational monitoring
Platform Responsibilities:
  • Enforcement architecture
  • View-based governance
  • Read-only AI execution
  • Session isolation
  • Audit traceability
Joint Responsibilities:
  • Governance reviews
  • Integration validation
  • Incident response coordination

Strategic Positioning

This governance framework is intentionally architecture-led and governance-first.

The objective is not to position AI as inherently secure.

The objective is to build governed AI operating boundaries.

This release focuses specifically on:
  • PII protection
  • Tokenization enforcement
  • Controlled analytical exposure
  • Governed onboarding into CDM
  • Read-only AI execution

Broader enterprise AI governance capabilities will evolve incrementally in future roadmap phases.

Final Governance Position

CDM should be treated as a governed analytical substrate — not an unrestricted enterprise data lake.

MaxAI should be treated as a governed intelligence layer — not an autonomous unrestricted AI system.

Security and compliance are enforced through:
  • Architecture
  • Governance
  • Controlled exposure
  • Tokenization
  • Operational discipline — not through AI intelligence alone.

CDM Data Acceptance Policy

CDM is a governed analytical substrate — not a general-purpose enterprise data store. Only data that serves a direct Martech, Customer 360, Campaign 360, Flowchart 360, segmentation, personalisation, or AI analytics purpose is permitted to enter CDM. This policy defines what is accepted and what is explicitly rejected.

CDM ACCEPTS:

  • Martech-relevant customer profile attributes required for segmentation and personalisation
  • Consent and preference data (with source, effective date, channel-level precedence, and status)
  • Campaign metadata (goals, budget, dates, state, offer and channel relationships)
  • Contact history and response history relevant to campaign analytics
  • Offer, channel, and product relationship data
  • Analytical aggregates required for Customer 360, Campaign 360, and Flowchart 360
  • Tokenized identifiers required for analytical joins
CDM DOES NOT ACCEPT:
  • Raw email addresses, phone numbers, postal addresses, national IDs, or KYC documents
  • Free-text notes or comments containing uncontrolled PII
  • Unrestricted source system dumps without a documented Martech use case
  • Token vault data or raw cryptographic keys
  • Non-marketing operational data with no Customer 360, Campaign 360 or Flowchart 360 use case
  • Sensitive data not required for segmentation, personalisation, reporting, or analytics
Note: "Martech-relevant" means data directly needed for audience selection, campaign execution, personalisation, analytics, or AI-driven decisioning within the MaxAI/Unica ecosystem. Any attribute that cannot be justified under these purposes must not enter CDM.

Attribute-Level Metadata Requirements for Onboarding

Every attribute onboarded into CDM must carry the following governance metadata. This metadata forms the implementation's PII matrix and feeds directly into the interface document. The fields below are mandatory — partial population blocks onboarding.
  • Source system name
  • Source column name
  • CDM target attribute
  • Business meaning / definition
  • PII category (Direct Identifier / Linkable / Sensitive / Non-PII)
  • Tokenization required: Y/N
  • Tokenization method (vault-based / salted hash / format-preserving / masking)
  • Allowed in 360 view: Y/N
  • Allowed for MaxAI / analytics exposure: Y/N
  • Aggregation-only flag: Y/N (attribute exposed only as aggregate, never row-level)
  • DQ rule reference
  • Lineage mapping (LDZ column → RDV table/column → 360 view)
  • Owner / data steward from implementation side

This metadata is captured in the PII Classification Matrix template.

Audience Resolution Governance Guardrails

The Audience Resolution Layer resolves campaign participation across customer, account, and device levels into a unified customer identifier. Because this layer deals with customer resolution and cross-level mapping, it creates re-identification risk if not governed correctly.
  • Audience identifiers must be tokenized before entering CDM — no raw customer, account, or device IDs in the resolution layer
  • Customer / account / device mappings must not expose raw identifiers in any downstream view
  • Resolution logic must be metadata-driven (via Audience_map table) and fully auditable
  • Unresolved audience records must be quarantined or flagged — not silently dropped or passed through
  • Mapping confidence and resolution status must be captured as metadata per record
  • Downstream 360 views must consume resolved canonical customer keys only — not raw source audience identifiers

Consent Model Coverage in CDM

The following consent attributes are in scope for the current release. Items marked as backlog are targeted for Release 26.2.

  • Consent source — in scope
  • Consent effective date — in scope
  • Consent status (current, expired) — in scope via expiry date
  • Consent use in segmentation and activation — in scope
  • Consent lineage from source to 360 — in scope via metadata lineage
  • Channel-level consent precedence — backlog (26.2)
  • Consent revocation handling — backlog (26.2); processing statement dependency noted
  • Missing consent default behaviour — backlog (26.2)
  • Stale consent handling — backlog (26.2)