Watt Data Logo

Resolve entity identities by matching emails, phones, addresses, or MAIDs. Supports multi-criterion queries with Noisy-OR quality score aggregation. Returns entity IDs grouped by individual with quality scores.

Quick Example

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    }
  ]
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity to resolve
identifiersarrayConditional-Array of identifier objectsMulti-criterion identifiers. Mutually exclusive with csv_resource_uri
csv_resource_uristringConditional-workflow:// URICSV file with identifiers. Mutually exclusive with identifiers
email_columnsobjectNo-{ names: string[], hash_type?: string }Email columns (only with csv_resource_uri)
phone_columnsobjectNo-{ names: string[], hash_type?: string }Phone columns (only with csv_resource_uri)
address_columnsobjectNo-{ names: string[], hash_type?: string }Address columns (only with csv_resource_uri)
formatstringNo"none""none", "csv", "json", "jsonl"Export format - generates presigned S3 URL valid for 1 hour
identifier_typesarrayNo["email"]"name", "email", "phone", "address", "maid"Contact types to return in identifiers field
workflow_idstringNo-Valid UUIDWorkflow session identifier for correlation

Parameter Details:

entity_type:

  • Required. Use "person" for individual identities or "business" for company entities.

identifiers:

  • Array of objects, each specifying id_type, hash_type, and values[]
  • Allows querying across different identifier types in one call
  • Email/phone/maid can be mixed in a single call
  • Address must be in a separate call (uses geospatial H3 matching)
  • Returns Noisy-OR aggregated overall_quality_score per entity
  • Mutually exclusive with csv_resource_uri

csv_resource_uri:

  • Workflow resource URI pointing to a CSV file (e.g., workflow://{workflow_id}/uploads/customers.csv)
  • At least one of email_columns, phone_columns, or address_columns must be provided
  • Mutually exclusive with identifiers

email_columns / phone_columns / address_columns (CSV mode):

  • names - Array of CSV column names containing the identifier values
  • hash_type - Hash format of values in those columns: "plaintext" (default), "md5", "sha1", or "sha256"
  • Use hash_type when your CSV contains pre-hashed identifiers

Supported id_types:

  • "email" - Email addresses with automatic normalization
  • "phone" - Phone numbers (E.164 format recommended)
  • "address" - Physical addresses (geospatial matching via H3 resolution 11, ~28m precision)
  • "maid" - Mobile advertising IDs (IDFA for iOS, GAID for Android)

Supported hash_types:

  • "plaintext" - Unhashed values
  • "md5" - MD5 hash
  • "sha1" - SHA-1 hash
  • "sha256" - SHA-256 hash

Example identifiers:

{
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    }
  ]
}

format:

  • When set to csv, json, or jsonl, generates S3 presigned download URL
  • URL expires in 1 hour
  • Returns export metadata in response

identifier_types:

  • Array of contact types to return in the identifiers field
  • Valid types: "name", "email", "phone", "address", "maid"
  • Default: ["email"]
  • Returns actual stored contact data from the resolved entity profiles
  • Eliminates need for follow-up entity_enrich call to retrieve contact info

workflow_id:

  • Optional UUID for tracking related tool calls in a session
  • If not provided, a new workflow_id is generated
  • Used for deterministic sampling and feedback correlation

Request Schema:

interface EntityResolveParams {
  entity_type: "person" | "business";
  identifiers?: Array<{
    id_type: "email" | "phone" | "address" | "maid";
    hash_type: "plaintext" | "md5" | "sha1" | "sha256";
    values: string[];
  }>;
  csv_resource_uri?: string;
  email_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  phone_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  address_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  format?: "none" | "csv" | "json" | "jsonl";
  identifier_types?: Array<"name" | "email" | "phone" | "address" | "maid">;
  workflow_id?: string;
}

Output Format

Success Response:

{
  entities: Array<{
    entity_id: number;
    overall_quality_score: number;
    matches: Array<{
      criterion_type: string;
      criterion_value: string;
      quality_score: number;
    }>;
    identifiers: {
      [type: string]: string[];
    };
    address?: {
      normalized_address: string;
      latitude: number;
      longitude: number;
      distance_meters: number;
    };
  }>,
  stats: {
    requested: number,
    resolved: number,
    rate: number
  },
  export?: {
    url: string;
    format: "csv" | "json" | "jsonl";
    rows: number;
    size_bytes: number;
    expires_at: string;
    resource_uri: string;
  },
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
entitiesarrayArray of resolved entities grouped by entity_id
entities[].entity_idnumberEntity ID
entities[].overall_quality_scorenumberNoisy-OR aggregated confidence (0-1) across all matches
entities[].matchesarrayIndividual criterion matches with per-criterion scores
entities[].matches[].criterion_typestringType (e.g., "email_plaintext", "phone_md5")
entities[].matches[].criterion_valuestringThe matched value
entities[].matches[].quality_scorenumberQuality score for this specific match (0-1)
entities[].identifiersobjectStored contact data, keyed by type
entities[].addressobjectGeocoding data with distance (only for address queries)
stats.requestednumberTotal identifiers provided
stats.resolvednumberTotal identifiers successfully resolved
stats.ratenumberResolution rate (resolved/requested)
exportobjectExport metadata (only when format is csv/json/jsonl)
export.urlstringPresigned S3 download URL (expires in 1 hour)
export.resource_uristringWorkflow resource URI for the exported file
tool_trace_idstringOpenTelemetry trace ID for this tool execution
workflow_idstringWorkflow session identifier

Example Response (Email Resolution):

{
  "entities": [
    {
      "entity_id": 123456,
      "overall_quality_score": 0.95,
      "matches": [
        {
          "criterion_type": "email_plaintext",
          "criterion_value": "john.doe@example.com",
          "quality_score": 0.95
        }
      ],
      "identifiers": {
        "email": ["john.doe@example.com", "jdoe@work.com"]
      }
    }
  ],
  "stats": {
    "requested": 2,
    "resolved": 1,
    "rate": 0.5
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example Response (Address Resolution with Distance):

{
  "entities": [
    {
      "entity_id": 789012,
      "overall_quality_score": 0.88,
      "matches": [
        {
          "criterion_type": "address_h3_11",
          "criterion_value": "123 Main St, San Francisco, CA 94105",
          "quality_score": 0.88
        }
      ],
      "identifiers": {
        "email": ["resident@example.com"]
      },
      "address": {
        "normalized_address": "123 Main St, San Francisco, CA 94105, USA",
        "latitude": 37.7749,
        "longitude": -122.4194,
        "distance_meters": 15.3
      }
    }
  ],
  "stats": {
    "requested": 1,
    "resolved": 1,
    "rate": 1.0
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

  • Both identifiers and csv_resource_uri provided: "identifiers and csv_resource_uri are mutually exclusive"
  • Neither provided: "Either identifiers or csv_resource_uri must be provided"
  • csv_resource_uri without column mappings: "At least one of email_columns, phone_columns, or address_columns must be provided"
  • Service temporarily unavailable: "Request failed - please try again"

Address Matching Behavior

  • Returns only the best-scoring entity(s) per input address (not all entities in the H3 cell)
  • Minimum quality threshold of 0.75 (~31m distance) excludes weak matches
  • Household members tied at max score are all returned
  • Maximum batch size: 1,000 addresses per request
  • Quality scores use distance-based decay: full score within 5m, linear decay to 10% at 100m, floor at 10% beyond

Usage Examples

Example 1: Simple email resolution

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    }
  ]
}

Example 2: Multi-criterion (email + phone)

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    }
  ]
}

Example 3: CSV resource input

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "email_columns": { "names": ["email"] },
  "phone_columns": { "names": ["phone"] }
}

Example 3b: CSV with pre-hashed identifiers

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "email_columns": { "names": ["email_md5"], "hash_type": "md5" },
  "phone_columns": { "names": ["phone_sha256"], "hash_type": "sha256" }
}

Example 4: Hashed identifiers with export

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "md5",
      "values": ["5d41402abc4b2a76b9719d911017c592"]
    }
  ],
  "format": "csv"
}

Example 5: Request specific identifier types

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    }
  ],
  "identifier_types": ["email", "phone", "name"]
}

On this page