Identity Enrichment
Overview
The Identity Enrichment workflow enables you to enrich customer identifiers (emails, phone numbers, or physical addresses) with comprehensive demographic, behavioral, and interest data. This workflow supports lists of arbitrary size, from single identifiers to millions of records.
Use Case
Goal: Given customer identifiers (emails, phone numbers, or addresses), retrieve detailed profile data including demographics, interests, affinities, purchase behavior, and more.
Business Value:
- Enrich CRM data with demographic and behavioral attributes
- Personalize marketing campaigns based on customer profiles
- Score leads based on demographic fit
- Append missing contact information (phone, address)
- Validate and update existing customer records
- Build customer segments for targeted messaging
Workflow Overview
Workflow Steps
Step 1: Resolve Identifiers to Person IDs
Convert customer identifiers into standardized person IDs that can be used for enrichment.
Tool: resolve_identities
Single Identifier Type:
Use when you have one type of identifier (e.g., only emails).
Request:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "resolve_identities",
"arguments": {
"id_type": "email",
"id_hash": "plaintext",
"identifiers": [
"customer1@example.com",
"customer2@example.com",
"customer3@example.com"
]
}
},
"id": 1
}Response:
{
"jsonrpc": "2.0",
"result": {
"identities": [
{
"person_id": 12345,
"overall_quality_score": 0.95,
"matches": [
{
"criterion_type": "email_plaintext",
"criterion_value": "customer1@example.com",
"quality_score": 0.95
}
],
"identifiers": {
"email": ["customer1@example.com", "alt1@example.com"],
"phone": ["5551234567"]
}
},
{
"person_id": 12346,
"overall_quality_score": 0.88,
"matches": [
{
"criterion_type": "email_plaintext",
"criterion_value": "customer2@example.com",
"quality_score": 0.88
}
],
"identifiers": {
"email": ["customer2@example.com"]
}
}
],
"stats": {
"requested": 3,
"resolved": 2,
"rate": 0.67
},
"workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
"tool_trace_id": "trace_ghi789"
},
"id": 1
}Multiple Identifier Types (Recommended):
Use when you have multiple types of identifiers. This provides better match rates by matching across identifier types.
Request:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "resolve_identities",
"arguments": {
"multi_identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": [
"customer1@example.com",
"customer2@example.com"
]
},
{
"id_type": "phone",
"hash_type": "plaintext",
"values": [
"5551234567",
"5559876543"
]
}
]
}
},
"id": 1
}Response:
{
"jsonrpc": "2.0",
"result": {
"identities": [
{
"person_id": 12345,
"overall_quality_score": 0.92,
"matches": [
{
"criterion_type": "email_plaintext",
"criterion_value": "customer1@example.com",
"quality_score": 0.95
},
{
"criterion_type": "phone_plaintext",
"criterion_value": "5551234567",
"quality_score": 0.85
}
],
"identifiers": {
"email": ["customer1@example.com", "alt1@example.com"],
"phone": ["5551234567", "5559999999"]
}
},
{
"person_id": 12346,
"overall_quality_score": 0.88,
"matches": [
{
"criterion_type": "phone_plaintext",
"criterion_value": "5559876543",
"quality_score": 0.88
}
],
"identifiers": {
"phone": ["5559876543", "5551111111"]
}
}
],
"stats": {
"requested": 4,
"resolved": 2,
"rate": 0.50
},
"workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
"tool_trace_id": "trace_ghi789"
},
"id": 1
}Hash Types:
| Hash Type | Description | Use Case |
|---|---|---|
plaintext | Raw, unhashed identifiers | Recommended for best match rates |
md5 | MD5 hashed identifiers | Legacy systems, privacy requirements |
sha1 | SHA-1 hashed identifiers | Security compliance |
sha256 | SHA-256 hashed identifiers | High security requirements |
Hashed Identifier Example:
{
"id_type": "email",
"id_hash": "md5",
"identifiers": [
"5d41402abc4b2a76b9719d911017c592",
"098f6bcd4621d373cade4e832627b4f6"
]
}Identifier Formatting:
| Type | Format | Example | Notes |
|---|---|---|---|
| Lowercase, normalized | john@example.com | Gmail variations automatically normalized | |
| Phone | Digits only, no country code (NDC+SN) | 5551234567 | No dashes, spaces, or +1 prefix |
| Address | Full standardized address | 123 Main St, San Francisco, CA 94105 | Plaintext only, automatically geocoded |
Quality Scores:
quality_score(0-1): Confidence for individual match criterionoverall_quality_score(0-1): Combined confidence using Noisy-OR aggregation- Formula:
1 - ∏(1 - score_i) - Preserves strong signals: email=0.9, phone=0.1 → 0.91
- Reinforces weak signals: 0.6, 0.6 → 0.84
- Formula:
- Filter by score to ensure match quality (e.g., >= 0.5)
Key Points:
- Returns only matched identifiers (check
stats.ratefor match percentage) multi_identifiersprovides better match rates by searching across multiple types- Identifiers are grouped by type in the response
- Use
person_idvalues in Step 2 for enrichment - No per-request limit on number of identifiers
- For large datasets (greater than 100K identifiers), use export format
Step 2: Enrich Person Profiles
Retrieve detailed profile data for resolved person IDs.
Tool: get_person
Request:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "get_person",
"arguments": {
"person_ids": ["12345", "12346"],
"domains": [
"demographic",
"interest",
"affinity",
"lifestyle",
"household",
"financial"
],
"format": "none"
}
},
"id": 2
}Response:
{
"jsonrpc": "2.0",
"result": {
"profiles": [
{
"person_id": "12345",
"metadata": {
"quality_score": 0.92,
"last_modified": "2025-01-15T10:30:00Z"
},
"domains": {
"age_range": "35-44",
"gender": "Female",
"education": "Bachelor Degree",
"household_income_range": "$100K-$150K",
"marital_status": "Married",
"interested_fitness": "Yes",
"interested_healthy_living": "Yes",
"interested_cooking": "Yes",
"health_affinity": "High",
"family_affinity": "Medium",
"is_health_conscious": "Yes",
"is_pet_owner": "Yes",
"number_children_in_household": "2",
"household_net_worth_range": "$250K-$500K"
}
},
{
"person_id": "12346",
"metadata": {
"quality_score": 0.88,
"last_modified": "2025-01-14T08:20:00Z"
},
"domains": {
"age_range": "45-54",
"gender": "Male",
"education": "Graduate Degree",
"household_income_range": "$150K+",
"interested_golf": "Yes",
"interested_fitness": "Yes",
"auto_affinity": "High",
"health_affinity": "Medium",
"is_home_owner": "Yes",
"is_investor": "Yes"
}
}
],
"workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
"tool_trace_id": "trace_jkl012"
},
"id": 2
}Available Domains (18 total):
| Domain | Description | Example Attributes |
|---|---|---|
demographic | Age, gender, education, income, ethnicity | age_range, gender, education, household_income_range |
interest | Hobbies, activities, interests | interested_fitness, interested_golf, interested_cooking |
affinity | Brand and category affinities | health_affinity, auto_affinity, family_affinity |
lifestyle | Lifestyle attributes and behaviors | is_health_conscious, is_pet_owner, is_home_owner |
household | Household composition | number_children_in_household, number_adults_in_household |
financial | Financial status and credit | household_net_worth_range, credit_rating_range, owns_investments |
purchase | Purchase history and behavior | purchased_clothing, purchased_books, apparel_purchases_total_spend |
employment | Job and career information | occupation_category |
political | Political affiliations | political_party_affiliation, donated_political_cause_recently |
email | Email addresses | email1, email2, email3 |
phone | Phone numbers | phone1, phone2, phone3 |
address | Physical addresses | address1, address2, address3 |
name | Person names | first_name, last_name |
id | Identity information | Various identity attributes |
maid | Mobile advertising IDs | Mobile ad identifiers |
content | Content consumption patterns | Reading habits, media consumption |
intent_category | Consumer intent categories | High-level intent signals |
intent_topic | Specific intent topics | Detailed intent topics |
Domain Selection Strategy:
For Lead Scoring:
"domains": ["demographic", "financial", "lifestyle", "intent_category"]For Personalization:
"domains": ["interest", "affinity", "content", "purchase"]For Contact Appending:
"domains": ["email", "phone", "address", "name"]For Comprehensive Profiles:
"domains": [
"address", "affinity", "content", "demographic", "email",
"employment", "financial", "household", "id", "intent_category",
"intent_topic", "interest", "lifestyle", "maid", "name",
"phone", "political", "purchase"
]Key Points:
- Maximum 1,000 person IDs per request (batch larger lists)
- Attributes returned as flat key-value pairs in
domainsobject - Multi-value fields (email, phone, address) are numbered: email1, email2, email3
- Request only needed domains to optimize response size and performance
profilesarray always populated, even when using export format- Missing attributes are omitted from response (not returned as null)
Complete Workflow Summary
Data Flow Example:
Input:
- customer1@example.com
- customer2@example.com
- 5551234567
↓ resolve_identities
Output:
- person_id: 12345 (customer1@example.com)
- person_id: 12346 (5551234567)
↓ get_person
Output:
- person_id: 12345 → {age_range: "35-44", gender: "Female", ...}
- person_id: 12346 → {age_range: "45-54", gender: "Male", ...}Performance Considerations
Batching Strategy
Identity Resolution:
- No per-request limit on identifiers
- For small lists (less than 1,000): Use
format: "none"for inline results - For large lists (greater than 1,000): Use
format: "csv"orformat: "json"for export
Profile Enrichment:
- Maximum 1,000 person IDs per request
- Batch larger lists in chunks of 1,000
Example Batching Logic:
// Split large person ID list into batches
const personIds = [...]; // Array of person IDs
const batchSize = 1000;
const batches = [];
for (let i = 0; i < personIds.length; i += batchSize) {
batches.push(personIds.slice(i, i + batchSize));
}
// Process each batch
const allProfiles = [];
for (const batch of batches) {
const response = await getPerson({
person_ids: batch,
domains: ["demographic", "interest", "affinity"],
workflow_id: workflowId
});
allProfiles.push(...response.profiles);
}Export Formats
| Format | Best For | Use Case |
|---|---|---|
none | Small datasets (less than 100 records) | Real-time enrichment, API responses |
csv | Medium/large datasets | CRM imports, spreadsheet analysis |
json | Programmatic processing | Application integration, data pipelines |
jsonl | Very large datasets | Streaming, incremental processing |
Export Example:
{
"name": "resolve_identities",
"arguments": {
"id_type": "email",
"id_hash": "plaintext",
"identifiers": [...],
"format": "csv"
}
}Export Response:
{
"identities": [...],
"stats": {...},
"export": {
"url": "https://s3.amazonaws.com/presigned-url...",
"format": "csv",
"rows": 50000,
"size_bytes": 15728640,
"expires_at": "2025-01-18T13:00:00Z"
}
}Export URLs:
- Valid for 1 hour from generation
- Download immediately or store file on your infrastructure
- Supports standard HTTP GET requests
- No authentication required (presigned URL)
Workflow ID Tracking
Use the same workflow_id across related requests for end-to-end tracing:
// Step 1: Identity resolution
{
"name": "resolve_identities",
"arguments": {
"id_type": "email",
"id_hash": "plaintext",
"identifiers": [...],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
// Step 2: Profile enrichment (use same workflow_id)
{
"name": "get_person",
"arguments": {
"person_ids": [...],
"domains": [...],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}
}Benefits:
- End-to-end request tracing in observability platform
- Performance analysis across tools
- Support troubleshooting with correlated logs
- Feedback submission for data quality issues
Performance Benchmarks
| Operation | Records | Typical Latency |
|---|---|---|
| resolve_identities (inline) | 100 | less than 1 second |
| resolve_identities (inline) | 1,000 | 1-2 seconds |
| resolve_identities (export) | 100,000 | 5-10 seconds |
| get_person | 100 | less than 1 second |
| get_person | 1,000 | 2-3 seconds |
Common Variations
Contact Appending
Add missing email, phone, or address to existing records:
{
"name": "get_person",
"arguments": {
"person_ids": ["12345", "12346"],
"domains": ["email", "phone", "address", "name"]
}
}Response includes up to 3 values per identifier type:
{
"person_id": "12345",
"domains": {
"email1": "primary@example.com",
"email2": "secondary@example.com",
"phone1": "5551234567",
"phone2": "5559876543",
"address1": "123 Main St, San Francisco, CA 94105",
"first_name": "Jane",
"last_name": "Smith"
}
}Lead Scoring
Enrich leads and score based on demographic fit:
const profiles = await getPerson({
person_ids: leadIds,
domains: ["demographic", "financial", "lifestyle", "intent_category"]
});
// Score each lead
const scoredLeads = profiles.profiles.map(profile => {
let score = 0;
if (profile.domains.household_income_range === "$150K+") score += 30;
if (profile.domains.household_net_worth_range === "$500K+") score += 30;
if (profile.domains.education === "Graduate Degree") score += 20;
if (profile.domains.is_home_owner === "Yes") score += 10;
if (profile.domains.is_investor === "Yes") score += 10;
return { person_id: profile.person_id, score };
});CRM Enhancement
Enrich CRM records with behavioral and interest data:
{
"name": "get_person",
"arguments": {
"person_ids": [...],
"domains": ["interest", "affinity", "purchase", "lifestyle"],
"format": "csv"
}
}Download CSV and import into CRM with custom field mapping.
Email Normalization Tracking
Track all email variations associated with a person:
{
"name": "resolve_identities",
"arguments": {
"id_type": "email",
"id_hash": "plaintext",
"identifiers": [
"john.doe@gmail.com",
"johndoe@gmail.com",
"john.doe+spam@gmail.com"
]
}
}All Gmail variations automatically normalize to canonical form and match to same person.
Address Geocoding
Resolve addresses and retrieve geographic coordinates:
{
"name": "resolve_identities",
"arguments": {
"id_type": "address",
"id_hash": "plaintext",
"identifiers": [
"123 Main St, San Francisco, CA 94105"
]
}
}Response includes normalized address and coordinates:
{
"person_id": 12345,
"address": {
"normalized_address": "123 Main St, San Francisco, CA 94105, USA",
"latitude": 37.7749,
"longitude": -122.4194,
"distance_meters": 0
}
}Error Handling
Low Match Rate (stats.rate < 0.5)
Causes:
- Invalid or outdated identifiers
- Incorrect hash type
- Poor identifier formatting
- Low data coverage for specific segment
Solutions:
- Validate identifier format before submission
- Try
multi_identifiersfor better match rates - Use plaintext instead of hashed for best results
- Check phone formatting (digits only, no country code)
- Verify email lowercase normalization
Missing Person IDs After Resolution
Causes:
- Identifiers not found in database
- Quality threshold too high
Solutions:
- Check
stats.resolvedvsstats.requested - Lower quality score threshold
- Use alternative identifier types
- Verify identifier validity
Empty Profiles (domains: {})
Causes:
- Person exists but has no data for requested domains
- Low data quality for specific person
Solutions:
- Request more domains to increase coverage
- Check
metadata.quality_scorefor data confidence - Filter profiles by quality score before processing
Export URL Expired
Causes:
- URL accessed after 1-hour expiration
- Long processing delay between request and download
Solutions:
- Download immediately after receiving URL
- Store file on your infrastructure if needed beyond 1 hour
- Re-run request to generate new export URL
- Use
workflow_idto track retries
Batch Size Exceeded
Causes:
- More than 1,000 person IDs in
get_personrequest
Solutions:
- Split into batches of 1,000 or fewer
- Process batches sequentially or in parallel
- Use same
workflow_idfor all batches
Next Steps
After enriching your customer data:
- Import to CRM - Update records with enriched attributes
- Build segments - Group customers by characteristics
- Personalize campaigns - Use interests and affinities for targeting
- Score leads - Prioritize by demographic fit
- Data validation - Compare enriched data with existing records
- Provide feedback - Use
submit_feedbacktool to report data quality issues
Related Workflows:
- ICP Analysis - Analyze enriched customers to discover defining characteristics
- Criteria-Based Audiences - Build audiences using enriched cluster data
Best Practices:
- Start with small test batches to validate data quality
- Request only needed domains to optimize performance
- Use
multi_identifiersfor maximum match rates - Filter by quality scores to ensure data confidence
- Track workflows with
workflow_idfor debugging - Export large datasets immediately to avoid URL expiration
- Batch requests efficiently (1,000 person IDs per call)
- Cache enriched data to minimize API calls