PII Protection, Tokenization & AI Data Governance
Executive Summary
This section establishes the foundational governance framework for PII protection, tokenization, and controlled AI data exposure within the Canonical Data Model (CDM) and MaxAI ecosystem.
The primary objective of this release is to ensure that sensitive and personally identifiable information (PII) is governed before it enters governed analytical and AI-serving layers. The governance posture defined in this section is intentionally architecture-driven and focuses on minimizing exposure risk at ingestion, storage, query, and AI interaction layers.
- Raw PII must not enter AI processing layers.
- PII must be tokenized prior to ingestion into CDM.
- Only analytics-required data should enter governed analytical layers.
- MaxAI must operate exclusively on governed, curated, read-only data surfaces.
- Governance must be enforced structurally through architecture, not probabilistically through AI filtering.
This release specifically focuses on PII exposure prevention, tokenization enforcement, controlled data onboarding into CDM, and governed exposure to MaxAI.
Governance Overview
Source Systems may contain raw PII and sensitive customer information. All approved PII must be identified and tokenized before data enters CDM.
CDM acts as the governed analytical substrate and should not become an unrestricted enterprise storage layer. Only analytics-relevant, governed, approved data is permitted into CDM.
MaxAI operates exclusively on governed curated views and never on raw source tables or token vaults.
- Ingestion & Tokenization: Ensures all inbound data is classified, protected, and tokenized before entering CDM, preventing raw PII exposure within governed analytical and AI-serving layers.
- Data Model Governance: Establishes governance controls across model lifecycle management, including change management, versioning, interface mappings, metadata governance, extension management, data quality controls, lineage traceability, PII classification, and controlled release/deployment processes.
- RBAC & View-Level Access Control: Restricts access through governed roles and curated views, enforcing masking, row-level security, read-only access, and prevention of unrestricted schema or base-table access.
- AI Interaction Layers: Ensures AI services operate only on approved governed datasets with controls around prompt handling, session isolation, output protection, audit logging, and query execution boundaries.
- Metadata Governance: Ensures mappings, lineage, DQ rules, and AI exposure controls are centrally governed through managed metadata. For more information refer to Metadata Model: The Core Integration Layer.
Core Governance Principles
- Zero Raw PII in AI: No raw PII may reach MaxAI under any condition.
- Tokenization First: All approved PII attributes must be tokenized prior to CDM ingestion.
- Data Minimization: Only analytics-required data should enter governed analytical and AI-serving layers.
- Read-Only AI: MaxAI cannot insert, update, delete, or mutate underlying datasets.
- Least Privilege Access: Access is restricted to curated governed views only.
- Client Ownership: Client owns classification, exposure approval, and risk acceptance decisions.
- Architecture-Led Security: Governance is enforced structurally through architecture, not AI intelligence alone.
PII Exposure Risk Landscape
Historically, Martech ecosystems frequently operated through extracts, spreadsheets, local flat files, and manually managed datasets due to operational realities and slow centralized data access.
- Operate at machine speed
- Accumulate conversational context
- Execute multi-step analytical workflows
- Generate derived insights
- Operate across larger contextual surfaces than human users
- Improper upstream ingestion of raw PII
- Misconfigured governed views
- Prompt-level PII injection
- Aggregation-based re-identification
- Cross-system joins
- Session context accumulation
- Export and download surfaces
- AI-generated analytical outputs
PII Discovery & Classification
PII discovery is a mandatory governance-controlled exercise conducted during onboarding and due diligence.
The platform does not automatically classify or infer PII. Classification ownership remains with client governance and compliance stakeholders.
- Initial discovery workshops
- Candidate attribute identification
- Governance review preparation
- Direct Identifier
- Linkable
- Sensitive
- Non-PII
The classification becomes the foundation for tokenization, exposure, and AI governance decisions.
Tokenization & Obfuscation Policy
All approved PII attributes must be tokenized before ingestion into CDM.
- Vault-based tokenization
- Salted irreversible hashing
- Controlled masking
- Format-preserving tokenization where required
- Exist outside CDM
- Remain inaccessible to MaxAI
- Be governed by client-controlled security infrastructure
- Be non-reversible within CDM and MaxAI
- Avoid semantic encoding
- Remain consistent where analytical joinability is required
Attribute-Level Governance Model
- PII classification
- Tokenization requirement
- Tokenization type
- LLM shareability
- Exposure classification
- Regulatory sensitivity indicators
This metadata becomes the foundational governance contract for downstream AI exposure control.
Example exposure models:
- RAW
- AGGREGATED_ONLY
- SUPPRESSED
Sensitive and non-marketing-required data should not enter CDM.
Data Access & Exposure Controls
MaxAI operates exclusively on curated governed database views.
- Direct base table access
- Token vault access
- Schema browsing
- Unrestricted joins
- Write operations
- Row-level security
- Column-level masking
- RBAC
- Read-only database roles
- View-level governance
Prompt-Level PII Exposure Governance
Prompt-level risk represents one of the most important new governance areas introduced by AI-assisted systems.
Users may intentionally or unintentionally submit PII into prompts. Such scenarios bypass upstream tokenization controls.
- Prompt audit logging
- Session traceability
- Architectural containment
- Read-only execution boundaries
- User awareness and governance discipline
This release doesn't position AI moderation as a deterministic security control.
Session and Memory Governance
AI systems introduce a new governance category referred to as data-in-conversation.
- Data-at-rest
- Data-in-transit
AI systems additionally require governance of:
- Session context
- Prompt histories
- Derived analytical context
- Multi-turn accumulation
For this release:
- Session isolation is mandatory
- Context persistence is minimized
- No unrestricted shared memory exists
- Session and prompt traceability are retained for audit purposes
AI Output Governance
AI-generated outputs can create indirect PII exposure even when upstream tokenization exists.
Governance controls therefore include:
- Aggregation thresholds
- Suppression rules
- Derived PII monitoring
- Export governance
- Query narrowing protections
- Output redaction
Minimum aggregation rules are enforced to reduce re-identification risk from small cohorts or unique analytical patterns.
Runtime, Capacity & Operational Governance
AI governance is also an operational governance problem.
Unbounded AI workloads can:
- Increase inference opportunities
- Create retry-loop vulnerabilities
- Bypass audit visibility
- Degrade governance enforcement
- Query complexity limits
- Session concurrency controls
- Capacity-aware throttling
- Agent execution constraints
- Audit preservation priorities
Integration & Ecosystem Governance
Any system integrating with CDM or MaxAI becomes part of the governance perimeter.
- BI tools
- APIs
- Streaming integrations
- Third-party connectors
- External analytical platforms
- Consume governed views only
- Respect tokenization controls
- Operate within approved access scopes
- Undergo governance review before onboarding
Compliance & Accountability
- PII classification
- Exposure approval
- Regulatory interpretation
- Token vault governance
- Operational monitoring
- Enforcement architecture
- View-based governance
- Read-only AI execution
- Session isolation
- Audit traceability
- Governance reviews
- Integration validation
- Incident response coordination
Strategic Positioning
This governance framework is intentionally architecture-led and governance-first.
The objective is not to position AI as inherently secure.
The objective is to build governed AI operating boundaries.
- PII protection
- Tokenization enforcement
- Controlled analytical exposure
- Governed onboarding into CDM
- Read-only AI execution
Broader enterprise AI governance capabilities will evolve incrementally in future roadmap phases.
Final Governance Position
CDM should be treated as a governed analytical substrate — not an unrestricted enterprise data lake.
MaxAI should be treated as a governed intelligence layer — not an autonomous unrestricted AI system.
- Architecture
- Governance
- Controlled exposure
- Tokenization
- Operational discipline — not through AI intelligence alone.
CDM Data Acceptance Policy
CDM is a governed analytical substrate — not a general-purpose enterprise data store. Only data that serves a direct Martech, Customer 360, Campaign 360, Flowchart 360, segmentation, personalisation, or AI analytics purpose is permitted to enter CDM. This policy defines what is accepted and what is explicitly rejected.
CDM ACCEPTS:
- Martech-relevant customer profile attributes required for segmentation and personalisation
- Consent and preference data (with source, effective date, channel-level precedence, and status)
- Campaign metadata (goals, budget, dates, state, offer and channel relationships)
- Contact history and response history relevant to campaign analytics
- Offer, channel, and product relationship data
- Analytical aggregates required for Customer 360, Campaign 360, and Flowchart 360
- Tokenized identifiers required for analytical joins
- Raw email addresses, phone numbers, postal addresses, national IDs, or KYC documents
- Free-text notes or comments containing uncontrolled PII
- Unrestricted source system dumps without a documented Martech use case
- Token vault data or raw cryptographic keys
- Non-marketing operational data with no Customer 360, Campaign 360 or Flowchart 360 use case
- Sensitive data not required for segmentation, personalisation, reporting, or analytics
Attribute-Level Metadata Requirements for Onboarding
- Source system name
- Source column name
- CDM target attribute
- Business meaning / definition
- PII category (Direct Identifier / Linkable / Sensitive / Non-PII)
- Tokenization required: Y/N
- Tokenization method (vault-based / salted hash / format-preserving / masking)
- Allowed in 360 view: Y/N
- Allowed for MaxAI / analytics exposure: Y/N
- Aggregation-only flag: Y/N (attribute exposed only as aggregate, never row-level)
- DQ rule reference
- Lineage mapping (LDZ column → RDV table/column → 360 view)
- Owner / data steward from implementation side
This metadata is captured in the PII Classification Matrix template.
Audience Resolution Governance Guardrails
- Audience identifiers must be tokenized before entering CDM — no raw customer, account, or device IDs in the resolution layer
- Customer / account / device mappings must not expose raw identifiers in any downstream view
- Resolution logic must be metadata-driven (via Audience_map table) and fully auditable
- Unresolved audience records must be quarantined or flagged — not silently dropped or passed through
- Mapping confidence and resolution status must be captured as metadata per record
- Downstream 360 views must consume resolved canonical customer keys only — not raw source audience identifiers
Consent Model Coverage in CDM
The following consent attributes are in scope for the current release. Items marked as backlog are targeted for Release 26.2.
- Consent source — in scope
- Consent effective date — in scope
- Consent status (current, expired) — in scope via expiry date
- Consent use in segmentation and activation — in scope
- Consent lineage from source to 360 — in scope via metadata lineage
- Channel-level consent precedence — backlog (26.2)
- Consent revocation handling — backlog (26.2); processing statement dependency noted
- Missing consent default behaviour — backlog (26.2)
- Stale consent handling — backlog (26.2)