If you've ever delved into Microsoft Purview, you've likely encountered a wealth of content on Data Lifecycle Management, Data Loss Prevention (DLP), and Information Protection. These are the hot topics everyone is buzzing about—and for good reason.
But here’s the surprising part: while these features get all the spotlight, there’s very little conversation around one of the most foundational building blocks that makes them all work—Sensitive Information Types (SITs).
That's right. SITs are the unsung heroes of your data protection strategy. Without them, your DLP policies wouldn't know what to safeguard, your retention labels wouldn't know what to preserve, and your information protection rules wouldn't know what's sensitive in the first place.
So, since this vital topic doesn't get nearly enough attention, I've decided to break it down for you. In this post, we'll dive into the basics of Sensitive Information Types, why they matter, and how they power everything else in your Purview environment.
Let's give SITs the spotlight they deserve.
What Are Sensitive Information Types (SITs) ?
To explain what Sensitive Information Types (SITs) are and why they’re so important, let’s start with a few simple, everyday examples.
🧺 Example 1:
Doing Laundry Before you start the wash, you usually sort your clothes into categories like Whites, Colors, and Delicates. You do this because sorting helps protect your clothes and ensures everything is treated the right way. Whites can get stained by colors, and Delicates need gentle handling.
🧰 Example 2:
Now, let's consider another example. You usually keep prescription medications in a secure, labeled cabinet—away from children and guests. You do this because they're sensitive, potentially dangerous, and require controlled access.
Now, let's go back in the tech side. Microsoft Purview works in a similar way—but instead of clothes or medications, it deals with sensitive data like:
Personally Identifiable Information (PII) : Social Security numbers, passport numbers, national IDs
Financial Data : Credit card numbers, bank account details, IBANs
Health Information : Medical record numbers, health insurance IDs
Confidential Business Data : Trade secrets, internal project codes, customer account numbers
Custom Data Types : Organization-specific identifiers or patterns
To protect this data, Microsoft Purview use Sensitive Information Types (SITs). They are the building blocks of data classification. They enable the system to automatically detect and classify datas across your organization’s digital environment.
Without SITs, classification would lack the intelligence needed to distinguish between general and sensitive data.
Once identified, Purview’s suite of solutions can apply the appropriate protections—like encryption, access controls, or alerts—to keep that data safe and compliant. In fact, they can play an important role in these following Purview services :
DSPM for AI
Insider Risk Management
Data Map
Unified Catalog
Sensitivity labels (Information protection)
Data Loss Prevention (DLP)
eDiscovery
Communication compliance
Records Management
Retention labels (Data lifecycle management)
How SITs connect with the Purview suite ?
Let's break down some of these above Purview services in a comprehensive way.
1. Data Loss Prevention (DLP)
🔍 Where do SITs appear :
In DLP policy rules as conditions like “Content contains → Sensitive info types”, across supported locations (Exchange, SharePoint, OneDrive, Teams, endpoints, etc.).
In DLP controls designed for Microsoft 365 Copilot / Copilot Chat prompts, where rules can block prompts containing SITs.
*️⃣ How SITs are used :
To detect and prevent data leakage / oversharing by recognizing sensitive patterns (SITs) in content and triggering protections.
To enforce controls (block/warn/override/audit) when sensitive data is shared or used in risky ways.
✅ Example 1 — Email (Exchange Online) : prevent accidental external sharing of payment data
Organization wants to prevent employees from emailing payment card data to external recipients (reduce PCI/data leak risk).
They set a DLP rule “Content contains: Credit Card Number (SIT)” in Exchange Online (email).
👉 When a user tries to send an email that contains one or more credit card numbers to an external address, he/she will be blocked (or warned with a Policy Tip + override option, depending on the configured action) and the admin can receive an alert/incident.
✅ Example 2 — Microsoft 365 Copilot prompts : stop sensitive data from being used in prompts
Organization wants to stop users from pasting highly sensitive identifiers (e.g., passport numbers / SSNs / bank info) into Copilot prompts to avoid oversharing and data exposure.
They set a DLP rule “Content contains: [selected SITs like Passport Number / SSN]” in the Microsoft 365 Copilot / Copilot Chat location.
👉 When a user enters a prompt containing these sensitive patterns, he/she will be prevented from getting a Copilot response (Copilot is restricted from processing the prompt), effectively blocking the interaction in real time.
2. Information protection
🔍 Where do SITs appear :
In auto-labeling logic used to apply sensitivity labels automatically based on content conditions.
*️⃣ How SITs are used :
To classify and protect information by automatically applying the right sensitivity label when sensitive patterns are detected.
✅ Example 1 — Service-side auto-labeling (SharePoint / OneDrive): label sensitive files at rest
Organization wants to ensure that any document containing national identifiers (e.g., SSN / national ID) is automatically classified as “Confidential” to enforce consistent protection without relying on users.
They set an auto-labeling rule “Content contains: U.S. Social Security Number (SIT)” → Apply sensitivity label “Confidential – HR” in SharePoint Online and OneDrive (service-side auto-labeling policy).
👉 When a user uploads or saves a document in SharePoint/OneDrive that contains SSNs, he/she will be automatically assigned the “Confidential – HR” sensitivity label, and the label’s protections apply (for example: encryption/permissions/markings, depending on the label configuration).
✅ Example 2 — Client-side labeling (Word/Outlook): recommend the right label while authoring
Organization wants to guide users to apply the correct sensitivity label while they’re creating content, without fully automating (to reduce false positives and keep user control).
They set a client-side labeling rule “If content contains: Credit Card Number (SIT) → Recommend sensitivity label ‘Highly Confidential – Finance’” in Microsoft Office apps (Word/Excel/PowerPoint/Outlook).
👉 When a user writes an email in Outlook or edits a document in Word that includes credit card numbers, he/she will be prompted with a label recommendation (or auto-application if configured), and can accept the label so the protections are applied immediately.
3. Data Lifecycle Management
🔍 Where do SITs appear :
In auto-apply retention label policies using “contains sensitive information (SIT)” as a condition.
*️⃣ How SITs are used :
Control retention/deletion of Microsoft 365 content using content signals like SITs
✅ Example 1 — Auto-apply retention to financial documents (SharePoint/OneDrive)
Organization wants to make sure that financial documents containing banking identifiers are retained for the legally required period (e.g., 7 years), without relying on users to tag anything manually.
They set an auto-apply retention rule “If content contains: International Banking Account Number (IBAN) (SIT) → apply retention label ‘Finance Records – 7 years’” in SharePoint Online and OneDrive.
👉 When a user uploads or stores an invoice/spreadsheet that contains an IBAN in SharePoint/OneDrive, he/she will be automatically assigned the retention label “Finance Records – 7 years”, and that item will then follow the label’s retention behavior (retain for 7 years, then delete/review depending on configuration).
✅ Example 2 — Auto-apply retention to emails containing PII (Exchange Online)
Organization wants to reduce exposure and meet privacy obligations by keeping PII-containing emails only for a controlled duration (e.g., retain 2 years, then delete).
They set an auto-apply retention rule “If content contains: U.S. Social Security Number (SSN) (SIT) → apply retention label ‘PII Email – retain 2 years then delete’” in Exchange Online (mailboxes).
👉 When a user sends or receives an email that contains SSNs, he/she will be automatically assigned the retention label “PII Email – retain 2 years then delete”, and the message will follow that lifecycle (retained for the defined period, then disposed according to the label settings).
4. Communication Compliance
🔍 Where do SITs appear :
In Communication Compliance policies as part of the conditions that detect risky/sensitive content (SITs are explicitly listed as used here)
*️⃣ How SITs are used :
To identify and review communications that violate policies (including disclosure of sensitive/confidential info) across Teams, Exchange, Copilot Chat, etc.
Note : Communication Compliance typically doesn’t block the message in real time like DLP. Instead, it detects and captures policy matches for review, then reviewers can take remediation actions
✅ Example 1 — Detect credit card numbers shared in Teams chats
Organization wants to reduce compliance risk by detecting when employees share payment card data in internal chat conversations (e.g., PCI risk and improper handling of customer data).
They set a Communication Compliance policy rule “Detect: Credit Card Number (SIT)” in Microsoft Teams messages (and optionally Exchange email too, depending on scope).
👉 When a user posts a Teams message containing credit card numbers, he/she will be flagged for review: the message becomes a policy match in Communication Compliance, and a designated reviewer can investigate and take appropriate action (for example, escalate, remediate, or remove the message where supported).
✅ Example 2 — Detect accidental sharing of internal secrets (custom SIT) in outbound communications
Organization wants to detect accidental disclosure of internal confidential terms (e.g., project codenames, “do-not-share” identifiers, internal contract IDs) in employee communications to protect intellectual property and business confidentiality.
They set a Communication Compliance policy rule “Detect: Custom Sensitive Information Type (SIT) for ‘Project Codename / Internal Deal ID’” in Exchange email and Teams messages (scope based on your monitored channels).
👉 When a user sends an email or writes a Teams message that contains one of those internal identifiers, he/she will be flagged for review: the message is captured as a policy match so a reviewer can validate context (false positive vs real leak) and then take action (for example, notify the user, escalate to Compliance/Legal, or apply remediation workflows supported by the channel).
What are the 2 types of Sensitive Information Types ?
You need to distinguish between two main categories for SITs.
🟠 Built-in SITs
These are predefined by Microsoft. They detect common sensitive data like credit card numbers, passport numbers, or email addresses. You don’t need to create them—they’re ready to use.
As of 2026, Microsoft Purview includes over 300 built-in Sensitive Information Types (SITs) for over 100 countries and regions worldwide. These SITs are designed to detect country-specific identifiers such as :
National ID numbers
Tax iDs
Passport numbers
Driver's license numbers
Bank account formats
Health identifiers
🟡 Custom SITs
These are user-defined. You create them when built-in SITs don’t meet your needs. You can define patterns, keywords, or use advanced methods like machine learning to detect specific data unique to your organization.
Custom SITs can be built in a classic way with building blocks (ingredients you use to describe what to look for inside content) :
Regular expression : to detect structured identifiers that follow a consistent format like Internal employee IDs (ABC + 6 digits, or HRX849201357-style identifiers)
Keyword list : to detect sensitive content when the presence of specific words is the strongest indicator like Project codenames (“Project Tradewinds”, “Project Falcon”, etc.)
Keyword dictionary : to detect sensitive content based on many keywords that evolve over time like Client names / customer portfolio, Product & code names
Preconfigured functions : to detect well-known formats with built-in logic/validation like Credit card numbers + supporting keywords like “Card”, “CVC”, “Expiry” to increase confidence.
For these 4 types of SITs, I will do a deep dive in an upcoming blog post.
But Custom SITs can also be also created in a advanced way from a structured reference dataset or a standard template document :
Exact Data Match : to detect your organization’s real identifiers with fewer false positives like Employee records (match EmployeeID + Name + DOB from HR exports)
Document Fingerprinting : to detect documents that are the same as (or similar to) the form like NDA template or internal forms
Who can create, manage
and view Sensitive Information Types ?
Administrative roles play a key part in managing Sensitive Information Types (SITs). They make sure that only the right people can create, adjust, or view settings related to sensitive data.
With these roles, authorized users can define custom SITs, set up detection rules, and keep an eye on how sensitive information is being identified and protected across the organization.
Let's break down this in a table
Create / Edit / Delete custom SITs |
🕵️♂️ Information Protection Admins | ✅ Can do: Create, edit, delete all classifier types (includes custom SITs, EDM/fingerprint-related classifiers), and manage related IP controls.
🔻Small group (1–3 people) in Security/Compliance team designs and maintains classifiers. |
EDM operations only (upload/refresh EDM datasets) |
🕵️♂️ Exact Data Match Upload Admins | ✅ Can do: Upload data for Exact Data Match (EDM)
🔻Assigned to a separate operator account/team (often HR/IT ops) so the people who design policies are not necessarily the ones who handle sensitive source datasets |
Monitor and triage “what’s happening” (DLP alerts + Activity Explorer) |
🕵️♂️ Information Protection Analysts | ✅ Can do: Access/manage DLP alerts and Activity Explorer, while having view-only access to policies/labels/classifiers.
🔻SOC/Compliance analysts use this daily to monitor incidents and investigate user activity without changing classifiers. |
Investigate incidents deeply (includes Content Explorer capability via role group) |
🕵️♂️ Information Protection Investigators | ✅ Can do: Access/manage DLP alerts and Activity Explorer, while having view-only access to policies/labels/classifiers.
🔻SOC/Compliance analysts use this daily to monitor incidents and investigate user activity without changing classifiers. |
View-only reporting (no creation, no investigation) |
🕵️♂️ Information Protection Readers (role group) | ✅ Can do: View-only access to reports for DLP and sensitivity labels.
🔻Managers / auditors who need reporting visibility, not operations. |
And last but not least, what licensing are needeed
for Sensitive Information Types ?
Microsoft Purview features are licensed mainly in two complementary ways: per‑user licenses (the classic Microsoft 365 E3/E5 model) and pay‑as‑you‑go (Azure consumption) for certain capabilities and non‑Microsoft 365 scenarios.
Organizations with Microsoft 365 E3
✅ Possible (Built‑in + Custom SITs)
Built‑in SITs: You can use built‑in SITs to detect sensitive data in DLP for email & files (Exchange/SharePoint/OneDrive), because these capabilities are included for E3 in Microsoft’s licensing comparison.
Custom SITs: You can create custom SITs using regex / keyword lists / keyword dictionaries / functions, and then use them as conditions in DLP (custom SITs are defined exactly with those elements, and DLP evaluates regex/keywords/functions).
❌ Not possible (without E5-level features / add-ons)
EDM (Exact Data Match) and trainable classifiers (ML) are advanced classification capabilities listed under E5 Information Protection & Governance entitlements rather than baseline E3.
Fingerprint SITs (document fingerprinting): Microsoft indicates E3 customers can’t create/modify fingerprints without upgrading to E5 (after April 2023).
OCR scanning is Pay‑As‑You‑Go (Azure billing required), not “included” as part of E3.
Organizations with Microsoft 365 E5
✅ Possible (Built‑in + Custom SITs)
Built‑in SITs: Same as E3 (built‑in SITs usable for DLP detections), plus broader advanced compliance coverage included with E5.
Custom SITs: You can create and use custom SITs (regex/keywords/functions) across Purview controls; custom SIT authoring is part of the core SIT model and DLP evaluates those patterns.
✅ Also possible (Advanced classifiers)
EDM and trainable classifiers (ML) are available as advanced Information Protection & Governance capabilities at the E5 level.
Fingerprint SITs: Full support for creating and maintaining document fingerprints is recommended for E5 customers.
❌ Not possible (without PAYG when applicable)
OCR scanning isn’t “free with E5”—it remains PAYG and must be enabled with Azure billing; then it extends detection to images.
Organizations with Microsoft 365 Business Premium
✅ Possible (Built‑in + Custom SITs)
Built‑in SITs: Business Premium includes Purview DLP and sensitivity labels, so you can use built‑in SITs in DLP rules for email/files.
Custom SITs: Business Premium supports creating custom labels and using DLP; custom SITs can be created with regex/keywords/functions and then used in DLP conditions (same DLP detection model).
❌ Not possible (without E5-level add-ons)
EDM and trainable classifiers (ML) are advanced IP&G capabilities associated with E5-level entitlements in Microsoft’s licensing comparison.
Fingerprint SITs require E5 for ongoing create/modify per Microsoft’s fingerprinting guidance. [learn.microsoft.com]
OCR scanning is PAYG (Azure billing) and then it extends your existing SIT/classifier detections to images
If you liked this tip and infographic on Microsoft Purview Sensitive Info Types and think it will be useful to others as well, feel free to share it.
Commentaires