Identity Enrichment

Overview

The Identity Enrichment workflow enables you to enrich customer identifiers (emails, phone numbers, or physical addresses) with comprehensive demographic, behavioral, and interest data. This workflow supports lists of arbitrary size, from single identifiers to millions of records.

Use Case

Goal: Given customer identifiers (emails, phone numbers, or addresses), retrieve detailed profile data including demographics, interests, affinities, purchase behavior, and more.

Business Value:

Enrich CRM data with demographic and behavioral attributes
Personalize marketing campaigns based on customer profiles
Score leads based on demographic fit
Append missing contact information (phone, address)
Validate and update existing customer records
Build customer segments for targeted messaging

Workflow Overview

Workflow Steps

Step 1: Resolve Identifiers to Person IDs

Convert customer identifiers into standardized person IDs that can be used for enrichment.

Tool: resolve_identities

Single Identifier Type:

Use when you have one type of identifier (e.g., only emails).

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "resolve_identities",
    "arguments": {
      "id_type": "email",
      "id_hash": "plaintext",
      "identifiers": [
        "customer1@example.com",
        "customer2@example.com",
        "customer3@example.com"
      ]
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "identities": [
      {
        "person_id": 12345,
        "overall_quality_score": 0.95,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer1@example.com",
            "quality_score": 0.95
          }
        ],
        "identifiers": {
          "email": ["customer1@example.com", "alt1@example.com"],
          "phone": ["5551234567"]
        }
      },
      {
        "person_id": 12346,
        "overall_quality_score": 0.88,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer2@example.com",
            "quality_score": 0.88
          }
        ],
        "identifiers": {
          "email": ["customer2@example.com"]
        }
      }
    ],
    "stats": {
      "requested": 3,
      "resolved": 2,
      "rate": 0.67
    },
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_ghi789"
  },
  "id": 1
}

Multiple Identifier Types (Recommended):

Use when you have multiple types of identifiers. This provides better match rates by matching across identifier types.

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "resolve_identities",
    "arguments": {
      "multi_identifiers": [
        {
          "id_type": "email",
          "hash_type": "plaintext",
          "values": [
            "customer1@example.com",
            "customer2@example.com"
          ]
        },
        {
          "id_type": "phone",
          "hash_type": "plaintext",
          "values": [
            "5551234567",
            "5559876543"
          ]
        }
      ]
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "identities": [
      {
        "person_id": 12345,
        "overall_quality_score": 0.92,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer1@example.com",
            "quality_score": 0.95
          },
          {
            "criterion_type": "phone_plaintext",
            "criterion_value": "5551234567",
            "quality_score": 0.85
          }
        ],
        "identifiers": {
          "email": ["customer1@example.com", "alt1@example.com"],
          "phone": ["5551234567", "5559999999"]
        }
      },
      {
        "person_id": 12346,
        "overall_quality_score": 0.88,
        "matches": [
          {
            "criterion_type": "phone_plaintext",
            "criterion_value": "5559876543",
            "quality_score": 0.88
          }
        ],
        "identifiers": {
          "phone": ["5559876543", "5551111111"]
        }
      }
    ],
    "stats": {
      "requested": 4,
      "resolved": 2,
      "rate": 0.50
    },
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_ghi789"
  },
  "id": 1
}

Hash Types:

Hash Type	Description	Use Case
`plaintext`	Raw, unhashed identifiers	Recommended for best match rates
`md5`	MD5 hashed identifiers	Legacy systems, privacy requirements
`sha1`	SHA-1 hashed identifiers	Security compliance
`sha256`	SHA-256 hashed identifiers	High security requirements

Hashed Identifier Example:

{
  "id_type": "email",
  "id_hash": "md5",
  "identifiers": [
    "5d41402abc4b2a76b9719d911017c592",
    "098f6bcd4621d373cade4e832627b4f6"
  ]
}

Identifier Formatting:

Type	Format	Example	Notes
Email	Lowercase, normalized	`john@example.com`	Gmail variations automatically normalized
Phone	Digits only, no country code (NDC+SN)	`5551234567`	No dashes, spaces, or +1 prefix
Address	Full standardized address	`123 Main St, San Francisco, CA 94105`	Plaintext only, automatically geocoded

Quality Scores:

quality_score (0-1): Confidence for individual match criterion
overall_quality_score (0-1): Combined confidence using Noisy-OR aggregation
- Formula: 1 - ∏(1 - score_i)
- Preserves strong signals: email=0.9, phone=0.1 → 0.91
- Reinforces weak signals: 0.6, 0.6 → 0.84
Filter by score to ensure match quality (e.g., >= 0.5)

Key Points:

Returns only matched identifiers (check stats.rate for match percentage)
multi_identifiers provides better match rates by searching across multiple types
Identifiers are grouped by type in the response
Use person_id values in Step 2 for enrichment
No per-request limit on number of identifiers
For large datasets (greater than 100K identifiers), use export format

Step 2: Enrich Person Profiles

Retrieve detailed profile data for resolved person IDs.

Tool: get_person

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_person",
    "arguments": {
      "person_ids": ["12345", "12346"],
      "domains": [
        "demographic",
        "interest",
        "affinity",
        "lifestyle",
        "household",
        "financial"
      ],
      "format": "none"
    }
  },
  "id": 2
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "profiles": [
      {
        "person_id": "12345",
        "metadata": {
          "quality_score": 0.92,
          "last_modified": "2025-01-15T10:30:00Z"
        },
        "domains": {
          "age_range": "35-44",
          "gender": "Female",
          "education": "Bachelor Degree",
          "household_income_range": "$100K-$150K",
          "marital_status": "Married",
          "interested_fitness": "Yes",
          "interested_healthy_living": "Yes",
          "interested_cooking": "Yes",
          "health_affinity": "High",
          "family_affinity": "Medium",
          "is_health_conscious": "Yes",
          "is_pet_owner": "Yes",
          "number_children_in_household": "2",
          "household_net_worth_range": "$250K-$500K"
        }
      },
      {
        "person_id": "12346",
        "metadata": {
          "quality_score": 0.88,
          "last_modified": "2025-01-14T08:20:00Z"
        },
        "domains": {
          "age_range": "45-54",
          "gender": "Male",
          "education": "Graduate Degree",
          "household_income_range": "$150K+",
          "interested_golf": "Yes",
          "interested_fitness": "Yes",
          "auto_affinity": "High",
          "health_affinity": "Medium",
          "is_home_owner": "Yes",
          "is_investor": "Yes"
        }
      }
    ],
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_jkl012"
  },
  "id": 2
}

Available Domains (18 total):

Domain	Description	Example Attributes
`demographic`	Age, gender, education, income, ethnicity	age_range, gender, education, household_income_range
`interest`	Hobbies, activities, interests	interested_fitness, interested_golf, interested_cooking
`affinity`	Brand and category affinities	health_affinity, auto_affinity, family_affinity
`lifestyle`	Lifestyle attributes and behaviors	is_health_conscious, is_pet_owner, is_home_owner
`household`	Household composition	number_children_in_household, number_adults_in_household
`financial`	Financial status and credit	household_net_worth_range, credit_rating_range, owns_investments
`purchase`	Purchase history and behavior	purchased_clothing, purchased_books, apparel_purchases_total_spend
`employment`	Job and career information	occupation_category
`political`	Political affiliations	political_party_affiliation, donated_political_cause_recently
`email`	Email addresses	email1, email2, email3
`phone`	Phone numbers	phone1, phone2, phone3
`address`	Physical addresses	address1, address2, address3
`name`	Person names	first_name, last_name
`id`	Identity information	Various identity attributes
`maid`	Mobile advertising IDs	Mobile ad identifiers
`content`	Content consumption patterns	Reading habits, media consumption
`intent_category`	Consumer intent categories	High-level intent signals
`intent_topic`	Specific intent topics	Detailed intent topics

Domain Selection Strategy:

For Lead Scoring:

"domains": ["demographic", "financial", "lifestyle", "intent_category"]

For Personalization:

"domains": ["interest", "affinity", "content", "purchase"]

For Contact Appending:

"domains": ["email", "phone", "address", "name"]

For Comprehensive Profiles:

"domains": [
  "address", "affinity", "content", "demographic", "email",
  "employment", "financial", "household", "id", "intent_category",
  "intent_topic", "interest", "lifestyle", "maid", "name",
  "phone", "political", "purchase"
]

Key Points:

Maximum 1,000 person IDs per request (batch larger lists)
Attributes returned as flat key-value pairs in domains object
Multi-value fields (email, phone, address) are numbered: email1, email2, email3
Request only needed domains to optimize response size and performance
profiles array always populated, even when using export format
Missing attributes are omitted from response (not returned as null)

Complete Workflow Summary

Data Flow Example:

Input:
  - customer1@example.com
  - customer2@example.com
  - 5551234567

↓ resolve_identities

Output:
  - person_id: 12345 (customer1@example.com)
  - person_id: 12346 (5551234567)

↓ get_person

Output:
  - person_id: 12345 → {age_range: "35-44", gender: "Female", ...}
  - person_id: 12346 → {age_range: "45-54", gender: "Male", ...}

Performance Considerations

Batching Strategy

Identity Resolution:

No per-request limit on identifiers
For small lists (less than 1,000): Use format: "none" for inline results
For large lists (greater than 1,000): Use format: "csv" or format: "json" for export

Profile Enrichment:

Maximum 1,000 person IDs per request
Batch larger lists in chunks of 1,000

Example Batching Logic:

// Split large person ID list into batches
const personIds = [...]; // Array of person IDs
const batchSize = 1000;
const batches = [];

for (let i = 0; i < personIds.length; i += batchSize) {
  batches.push(personIds.slice(i, i + batchSize));
}

// Process each batch
const allProfiles = [];
for (const batch of batches) {
  const response = await getPerson({
    person_ids: batch,
    domains: ["demographic", "interest", "affinity"],
    workflow_id: workflowId
  });
  allProfiles.push(...response.profiles);
}

Export Formats

Format	Best For	Use Case
`none`	Small datasets (less than 100 records)	Real-time enrichment, API responses
`csv`	Medium/large datasets	CRM imports, spreadsheet analysis
`json`	Programmatic processing	Application integration, data pipelines
`jsonl`	Very large datasets	Streaming, incremental processing

Export Example:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [...],
    "format": "csv"
  }
}

Export Response:

{
  "identities": [...],
  "stats": {...},
  "export": {
    "url": "https://s3.amazonaws.com/presigned-url...",
    "format": "csv",
    "rows": 50000,
    "size_bytes": 15728640,
    "expires_at": "2025-01-18T13:00:00Z"
  }
}

Export URLs:

Valid for 1 hour from generation
Download immediately or store file on your infrastructure
Supports standard HTTP GET requests
No authentication required (presigned URL)

Workflow ID Tracking

Use the same workflow_id across related requests for end-to-end tracing:

// Step 1: Identity resolution
{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [...],
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

// Step 2: Profile enrichment (use same workflow_id)
{
  "name": "get_person",
  "arguments": {
    "person_ids": [...],
    "domains": [...],
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Benefits:

End-to-end request tracing in observability platform
Performance analysis across tools
Support troubleshooting with correlated logs
Feedback submission for data quality issues

Performance Benchmarks

Operation	Records	Typical Latency
resolve_identities (inline)	100	less than 1 second
resolve_identities (inline)	1,000	1-2 seconds
resolve_identities (export)	100,000	5-10 seconds
get_person	100	less than 1 second
get_person	1,000	2-3 seconds

Common Variations

Contact Appending

Add missing email, phone, or address to existing records:

{
  "name": "get_person",
  "arguments": {
    "person_ids": ["12345", "12346"],
    "domains": ["email", "phone", "address", "name"]
  }
}

Response includes up to 3 values per identifier type:

{
  "person_id": "12345",
  "domains": {
    "email1": "primary@example.com",
    "email2": "secondary@example.com",
    "phone1": "5551234567",
    "phone2": "5559876543",
    "address1": "123 Main St, San Francisco, CA 94105",
    "first_name": "Jane",
    "last_name": "Smith"
  }
}

Lead Scoring

Enrich leads and score based on demographic fit:

const profiles = await getPerson({
  person_ids: leadIds,
  domains: ["demographic", "financial", "lifestyle", "intent_category"]
});

// Score each lead
const scoredLeads = profiles.profiles.map(profile => {
  let score = 0;

  if (profile.domains.household_income_range === "$150K+") score += 30;
  if (profile.domains.household_net_worth_range === "$500K+") score += 30;
  if (profile.domains.education === "Graduate Degree") score += 20;
  if (profile.domains.is_home_owner === "Yes") score += 10;
  if (profile.domains.is_investor === "Yes") score += 10;

  return { person_id: profile.person_id, score };
});

CRM Enhancement

Enrich CRM records with behavioral and interest data:

{
  "name": "get_person",
  "arguments": {
    "person_ids": [...],
    "domains": ["interest", "affinity", "purchase", "lifestyle"],
    "format": "csv"
  }
}

Download CSV and import into CRM with custom field mapping.

Email Normalization Tracking

Track all email variations associated with a person:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [
      "john.doe@gmail.com",
      "johndoe@gmail.com",
      "john.doe+spam@gmail.com"
    ]
  }
}

All Gmail variations automatically normalize to canonical form and match to same person.

Address Geocoding

Resolve addresses and retrieve geographic coordinates:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "address",
    "id_hash": "plaintext",
    "identifiers": [
      "123 Main St, San Francisco, CA 94105"
    ]
  }
}

Response includes normalized address and coordinates:

{
  "person_id": 12345,
  "address": {
    "normalized_address": "123 Main St, San Francisco, CA 94105, USA",
    "latitude": 37.7749,
    "longitude": -122.4194,
    "distance_meters": 0
  }
}