You’re viewing the V1 docs. V2 is now recommended — read the V2 docs.
Watt Data

Identity Enrichment

Overview

The Identity Enrichment workflow enables you to enrich customer identifiers (emails, phone numbers, or physical addresses) with comprehensive demographic, behavioral, and interest data. This workflow supports lists of arbitrary size, from single identifiers to millions of records.

Use Case

Goal: Given customer identifiers (emails, phone numbers, or addresses), retrieve detailed profile data including demographics, interests, affinities, purchase behavior, and more.

Business Value:

  • Enrich CRM data with demographic and behavioral attributes
  • Personalize marketing campaigns based on customer profiles
  • Score leads based on demographic fit
  • Append missing contact information (phone, address)
  • Validate and update existing customer records
  • Build customer segments for targeted messaging

Workflow Overview

Workflow Steps

Step 1: Resolve Identifiers to Person IDs

Convert customer identifiers into standardized person IDs that can be used for enrichment.

Tool: resolve_identities

Single Identifier Type:

Use when you have one type of identifier (e.g., only emails).

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "resolve_identities",
    "arguments": {
      "id_type": "email",
      "id_hash": "plaintext",
      "identifiers": [
        "customer1@example.com",
        "customer2@example.com",
        "customer3@example.com"
      ]
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "identities": [
      {
        "person_id": 12345,
        "overall_quality_score": 0.95,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer1@example.com",
            "quality_score": 0.95
          }
        ],
        "identifiers": {
          "email": ["customer1@example.com", "alt1@example.com"],
          "phone": ["5551234567"]
        }
      },
      {
        "person_id": 12346,
        "overall_quality_score": 0.88,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer2@example.com",
            "quality_score": 0.88
          }
        ],
        "identifiers": {
          "email": ["customer2@example.com"]
        }
      }
    ],
    "stats": {
      "requested": 3,
      "resolved": 2,
      "rate": 0.67
    },
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_ghi789"
  },
  "id": 1
}

Multiple Identifier Types (Recommended):

Use when you have multiple types of identifiers. This provides better match rates by matching across identifier types.

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "resolve_identities",
    "arguments": {
      "multi_identifiers": [
        {
          "id_type": "email",
          "hash_type": "plaintext",
          "values": [
            "customer1@example.com",
            "customer2@example.com"
          ]
        },
        {
          "id_type": "phone",
          "hash_type": "plaintext",
          "values": [
            "5551234567",
            "5559876543"
          ]
        }
      ]
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "identities": [
      {
        "person_id": 12345,
        "overall_quality_score": 0.92,
        "matches": [
          {
            "criterion_type": "email_plaintext",
            "criterion_value": "customer1@example.com",
            "quality_score": 0.95
          },
          {
            "criterion_type": "phone_plaintext",
            "criterion_value": "5551234567",
            "quality_score": 0.85
          }
        ],
        "identifiers": {
          "email": ["customer1@example.com", "alt1@example.com"],
          "phone": ["5551234567", "5559999999"]
        }
      },
      {
        "person_id": 12346,
        "overall_quality_score": 0.88,
        "matches": [
          {
            "criterion_type": "phone_plaintext",
            "criterion_value": "5559876543",
            "quality_score": 0.88
          }
        ],
        "identifiers": {
          "phone": ["5559876543", "5551111111"]
        }
      }
    ],
    "stats": {
      "requested": 4,
      "resolved": 2,
      "rate": 0.50
    },
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_ghi789"
  },
  "id": 1
}

Hash Types:

Hash TypeDescriptionUse Case
plaintextRaw, unhashed identifiersRecommended for best match rates
md5MD5 hashed identifiersLegacy systems, privacy requirements
sha1SHA-1 hashed identifiersSecurity compliance
sha256SHA-256 hashed identifiersHigh security requirements

Hashed Identifier Example:

{
  "id_type": "email",
  "id_hash": "md5",
  "identifiers": [
    "5d41402abc4b2a76b9719d911017c592",
    "098f6bcd4621d373cade4e832627b4f6"
  ]
}

Identifier Formatting:

TypeFormatExampleNotes
EmailLowercase, normalizedjohn@example.comGmail variations automatically normalized
PhoneDigits only, no country code (NDC+SN)5551234567No dashes, spaces, or +1 prefix
AddressFull standardized address123 Main St, San Francisco, CA 94105Plaintext only, automatically geocoded

Quality Scores:

  • quality_score (0-1): Confidence for individual match criterion
  • overall_quality_score (0-1): Combined confidence using Noisy-OR aggregation
    • Formula: 1 - ∏(1 - score_i)
    • Preserves strong signals: email=0.9, phone=0.1 → 0.91
    • Reinforces weak signals: 0.6, 0.6 → 0.84
  • Filter by score to ensure match quality (e.g., >= 0.5)

Key Points:

  • Returns only matched identifiers (check stats.rate for match percentage)
  • multi_identifiers provides better match rates by searching across multiple types
  • Identifiers are grouped by type in the response
  • Use person_id values in Step 2 for enrichment
  • No per-request limit on number of identifiers
  • For large datasets (greater than 100K identifiers), use export format

Step 2: Enrich Person Profiles

Retrieve detailed profile data for resolved person IDs.

Tool: get_person

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_person",
    "arguments": {
      "person_ids": ["12345", "12346"],
      "domains": [
        "demographic",
        "interest",
        "affinity",
        "lifestyle",
        "household",
        "financial"
      ],
      "format": "none"
    }
  },
  "id": 2
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "profiles": [
      {
        "person_id": "12345",
        "metadata": {
          "quality_score": 0.92,
          "last_modified": "2025-01-15T10:30:00Z"
        },
        "domains": {
          "age_range": "35-44",
          "gender": "Female",
          "education": "Bachelor Degree",
          "household_income_range": "$100K-$150K",
          "marital_status": "Married",
          "interested_fitness": "Yes",
          "interested_healthy_living": "Yes",
          "interested_cooking": "Yes",
          "health_affinity": "High",
          "family_affinity": "Medium",
          "is_health_conscious": "Yes",
          "is_pet_owner": "Yes",
          "number_children_in_household": "2",
          "household_net_worth_range": "$250K-$500K"
        }
      },
      {
        "person_id": "12346",
        "metadata": {
          "quality_score": 0.88,
          "last_modified": "2025-01-14T08:20:00Z"
        },
        "domains": {
          "age_range": "45-54",
          "gender": "Male",
          "education": "Graduate Degree",
          "household_income_range": "$150K+",
          "interested_golf": "Yes",
          "interested_fitness": "Yes",
          "auto_affinity": "High",
          "health_affinity": "Medium",
          "is_home_owner": "Yes",
          "is_investor": "Yes"
        }
      }
    ],
    "workflow_id": "b2c3d4e5-f6a7-8901-bcde-f2345678901a",
    "tool_trace_id": "trace_jkl012"
  },
  "id": 2
}

Available Domains (18 total):

DomainDescriptionExample Attributes
demographicAge, gender, education, income, ethnicityage_range, gender, education, household_income_range
interestHobbies, activities, interestsinterested_fitness, interested_golf, interested_cooking
affinityBrand and category affinitieshealth_affinity, auto_affinity, family_affinity
lifestyleLifestyle attributes and behaviorsis_health_conscious, is_pet_owner, is_home_owner
householdHousehold compositionnumber_children_in_household, number_adults_in_household
financialFinancial status and credithousehold_net_worth_range, credit_rating_range, owns_investments
purchasePurchase history and behaviorpurchased_clothing, purchased_books, apparel_purchases_total_spend
employmentJob and career informationoccupation_category
politicalPolitical affiliationspolitical_party_affiliation, donated_political_cause_recently
emailEmail addressesemail1, email2, email3
phonePhone numbersphone1, phone2, phone3
addressPhysical addressesaddress1, address2, address3
namePerson namesfirst_name, last_name
idIdentity informationVarious identity attributes
maidMobile advertising IDsMobile ad identifiers
contentContent consumption patternsReading habits, media consumption
intent_categoryConsumer intent categoriesHigh-level intent signals
intent_topicSpecific intent topicsDetailed intent topics

Domain Selection Strategy:

For Lead Scoring:

"domains": ["demographic", "financial", "lifestyle", "intent_category"]

For Personalization:

"domains": ["interest", "affinity", "content", "purchase"]

For Contact Appending:

"domains": ["email", "phone", "address", "name"]

For Comprehensive Profiles:

"domains": [
  "address", "affinity", "content", "demographic", "email",
  "employment", "financial", "household", "id", "intent_category",
  "intent_topic", "interest", "lifestyle", "maid", "name",
  "phone", "political", "purchase"
]

Key Points:

  • Maximum 1,000 person IDs per request (batch larger lists)
  • Attributes returned as flat key-value pairs in domains object
  • Multi-value fields (email, phone, address) are numbered: email1, email2, email3
  • Request only needed domains to optimize response size and performance
  • profiles array always populated, even when using export format
  • Missing attributes are omitted from response (not returned as null)

Complete Workflow Summary

Data Flow Example:

Input:
  - customer1@example.com
  - customer2@example.com
  - 5551234567

↓ resolve_identities

Output:
  - person_id: 12345 (customer1@example.com)
  - person_id: 12346 (5551234567)

↓ get_person

Output:
  - person_id: 12345 → {age_range: "35-44", gender: "Female", ...}
  - person_id: 12346 → {age_range: "45-54", gender: "Male", ...}

Performance Considerations

Batching Strategy

Identity Resolution:

  • No per-request limit on identifiers
  • For small lists (less than 1,000): Use format: "none" for inline results
  • For large lists (greater than 1,000): Use format: "csv" or format: "json" for export

Profile Enrichment:

  • Maximum 1,000 person IDs per request
  • Batch larger lists in chunks of 1,000

Example Batching Logic:

// Split large person ID list into batches
const personIds = [...]; // Array of person IDs
const batchSize = 1000;
const batches = [];

for (let i = 0; i < personIds.length; i += batchSize) {
  batches.push(personIds.slice(i, i + batchSize));
}

// Process each batch
const allProfiles = [];
for (const batch of batches) {
  const response = await getPerson({
    person_ids: batch,
    domains: ["demographic", "interest", "affinity"],
    workflow_id: workflowId
  });
  allProfiles.push(...response.profiles);
}

Export Formats

FormatBest ForUse Case
noneSmall datasets (less than 100 records)Real-time enrichment, API responses
csvMedium/large datasetsCRM imports, spreadsheet analysis
jsonProgrammatic processingApplication integration, data pipelines
jsonlVery large datasetsStreaming, incremental processing

Export Example:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [...],
    "format": "csv"
  }
}

Export Response:

{
  "identities": [...],
  "stats": {...},
  "export": {
    "url": "https://s3.amazonaws.com/presigned-url...",
    "format": "csv",
    "rows": 50000,
    "size_bytes": 15728640,
    "expires_at": "2025-01-18T13:00:00Z"
  }
}

Export URLs:

  • Valid for 1 hour from generation
  • Download immediately or store file on your infrastructure
  • Supports standard HTTP GET requests
  • No authentication required (presigned URL)

Workflow ID Tracking

Use the same workflow_id across related requests for end-to-end tracing:

// Step 1: Identity resolution
{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [...],
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

// Step 2: Profile enrichment (use same workflow_id)
{
  "name": "get_person",
  "arguments": {
    "person_ids": [...],
    "domains": [...],
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Benefits:

  • End-to-end request tracing in observability platform
  • Performance analysis across tools
  • Support troubleshooting with correlated logs
  • Feedback submission for data quality issues

Performance Benchmarks

OperationRecordsTypical Latency
resolve_identities (inline)100less than 1 second
resolve_identities (inline)1,0001-2 seconds
resolve_identities (export)100,0005-10 seconds
get_person100less than 1 second
get_person1,0002-3 seconds

Common Variations

Contact Appending

Add missing email, phone, or address to existing records:

{
  "name": "get_person",
  "arguments": {
    "person_ids": ["12345", "12346"],
    "domains": ["email", "phone", "address", "name"]
  }
}

Response includes up to 3 values per identifier type:

{
  "person_id": "12345",
  "domains": {
    "email1": "primary@example.com",
    "email2": "secondary@example.com",
    "phone1": "5551234567",
    "phone2": "5559876543",
    "address1": "123 Main St, San Francisco, CA 94105",
    "first_name": "Jane",
    "last_name": "Smith"
  }
}

Lead Scoring

Enrich leads and score based on demographic fit:

const profiles = await getPerson({
  person_ids: leadIds,
  domains: ["demographic", "financial", "lifestyle", "intent_category"]
});

// Score each lead
const scoredLeads = profiles.profiles.map(profile => {
  let score = 0;

  if (profile.domains.household_income_range === "$150K+") score += 30;
  if (profile.domains.household_net_worth_range === "$500K+") score += 30;
  if (profile.domains.education === "Graduate Degree") score += 20;
  if (profile.domains.is_home_owner === "Yes") score += 10;
  if (profile.domains.is_investor === "Yes") score += 10;

  return { person_id: profile.person_id, score };
});

CRM Enhancement

Enrich CRM records with behavioral and interest data:

{
  "name": "get_person",
  "arguments": {
    "person_ids": [...],
    "domains": ["interest", "affinity", "purchase", "lifestyle"],
    "format": "csv"
  }
}

Download CSV and import into CRM with custom field mapping.

Email Normalization Tracking

Track all email variations associated with a person:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "email",
    "id_hash": "plaintext",
    "identifiers": [
      "john.doe@gmail.com",
      "johndoe@gmail.com",
      "john.doe+spam@gmail.com"
    ]
  }
}

All Gmail variations automatically normalize to canonical form and match to same person.

Address Geocoding

Resolve addresses and retrieve geographic coordinates:

{
  "name": "resolve_identities",
  "arguments": {
    "id_type": "address",
    "id_hash": "plaintext",
    "identifiers": [
      "123 Main St, San Francisco, CA 94105"
    ]
  }
}

Response includes normalized address and coordinates:

{
  "person_id": 12345,
  "address": {
    "normalized_address": "123 Main St, San Francisco, CA 94105, USA",
    "latitude": 37.7749,
    "longitude": -122.4194,
    "distance_meters": 0
  }
}

Error Handling

Low Match Rate (stats.rate < 0.5)

Causes:

  • Invalid or outdated identifiers
  • Incorrect hash type
  • Poor identifier formatting
  • Low data coverage for specific segment

Solutions:

  • Validate identifier format before submission
  • Try multi_identifiers for better match rates
  • Use plaintext instead of hashed for best results
  • Check phone formatting (digits only, no country code)
  • Verify email lowercase normalization

Missing Person IDs After Resolution

Causes:

  • Identifiers not found in database
  • Quality threshold too high

Solutions:

  • Check stats.resolved vs stats.requested
  • Lower quality score threshold
  • Use alternative identifier types
  • Verify identifier validity

Empty Profiles (domains: {})

Causes:

  • Person exists but has no data for requested domains
  • Low data quality for specific person

Solutions:

  • Request more domains to increase coverage
  • Check metadata.quality_score for data confidence
  • Filter profiles by quality score before processing

Export URL Expired

Causes:

  • URL accessed after 1-hour expiration
  • Long processing delay between request and download

Solutions:

  • Download immediately after receiving URL
  • Store file on your infrastructure if needed beyond 1 hour
  • Re-run request to generate new export URL
  • Use workflow_id to track retries

Batch Size Exceeded

Causes:

  • More than 1,000 person IDs in get_person request

Solutions:

  • Split into batches of 1,000 or fewer
  • Process batches sequentially or in parallel
  • Use same workflow_id for all batches

Next Steps

After enriching your customer data:

  1. Import to CRM - Update records with enriched attributes
  2. Build segments - Group customers by characteristics
  3. Personalize campaigns - Use interests and affinities for targeting
  4. Score leads - Prioritize by demographic fit
  5. Data validation - Compare enriched data with existing records
  6. Provide feedback - Use submit_feedback tool to report data quality issues

Related Workflows:

Best Practices:

  • Start with small test batches to validate data quality
  • Request only needed domains to optimize performance
  • Use multi_identifiers for maximum match rates
  • Filter by quality scores to ensure data confidence
  • Track workflows with workflow_id for debugging
  • Export large datasets immediately to avoid URL expiration
  • Batch requests efficiently (1,000 person IDs per call)
  • Cache enriched data to minimize API calls

On this page