Watt Data

Resolve entity identities by matching emails, phones, addresses, MAIDs, websites, or social handles. Supports multi-criterion queries with Noisy-OR quality score aggregation. Returns entity IDs grouped by individual with quality scores.

Quick Example

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    }
  ]
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity to resolve
identifiersarrayConditional-Max 50 groups per request; each group's values array is capped at 3,000 entries — use csv_resource_uri for larger inputsMulti-criterion identifiers. Mutually exclusive with csv_resource_uri
csv_resource_uristringConditional-workflow:// URICSV file with identifiers. Mutually exclusive with identifiers
lookup_columnsobjectConditional-See belowColumn-mapping for CSV-based resolution. Required when csv_resource_uri is set; at least one sub-key (email, phone, address, name, linkedin, domain) must have non-empty names
offsetnumberNo0Integer ≥ 0Number of CSV data rows to skip before reading. Use with limit to paginate large CSVs across multiple calls. Only applies to csv_resource_uri; ignored when identifiers is used
limitnumberNo2000001 ≤ limit ≤ 200000Maximum number of CSV data rows to read in this call. Only applies to csv_resource_uri; ignored when identifiers is used
formatstringNo"none""none", "csv", "json", "jsonl"Export format - generates presigned S3 URL valid for 1 hour
identifier_typesarrayNoperson → ["email"], business → ["name"]person: "name", "email", "phone", "address", "maid", "social:linkedin" — business: "name", "phone", "address", "social:linkedin", "website"Contact types to return in identifiers field (allowed values depend on entity_type)
workflow_idstringNo-Valid UUIDWorkflow session identifier for correlation

Parameter Details:

entity_type:

  • Required. Use "person" for individual identities or "business" for company entities.

identifiers:

  • Array of objects, each specifying id_type, hash_type, and values[]
  • Allows querying across different identifier types in one call
  • Email/phone/maid can be mixed in a single call
  • Address identifiers can also be included alongside other types
  • Returns Noisy-OR aggregated overall_quality_score per entity
  • Capped at 50 identifier groups per request — split larger inputs into multiple calls
  • Each identifier group's values array is capped at 3,000 entries — for larger inputs use csv_resource_uri (governed by a separate 200,000-row cap)
  • Mutually exclusive with csv_resource_uri

csv_resource_uri:

  • Workflow resource URI pointing to a CSV file (e.g., workflow://{workflow_id}/uploads/customers.csv)
  • Requires lookup_columns with at least one identifier type populated
  • The CSV is processed in pages of up to 200,000 rows per call. When more rows remain, the response includes a next_offset field — pass it back as offset on the next call. The field is omitted on the last page.
  • Mutually exclusive with identifiers

lookup_columns (CSV mode):

Maps CSV columns to identifier types. The same shape is used by resolve_and_enrich_rows — see Conventions → CSV Column Mapping for the canonical reference, including per-identifier rules, multi-column address joining, and the address_parse_low_yield warning.

{
  email?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  phone?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  address?:  { names: string[] },
  name?:     { names: string[] },
  linkedin?: { names: string[] },   // resolves via the `social:linkedin` identifier type
  domain?:   { names: string[] }    // business entities — resolves via the `website` identifier type
}

At least one sub-key with non-empty names is required. Only email and phone accept hash_type; the other types require plaintext values. When address.names lists more than one column, per-row cell values are concatenated in listed order with ", " before libpostal parsing — list them street-first (address1, address2?, city?, region?, postcode, country?).

Migration from legacy *_columns keys: The flat email_columns, phone_columns, and address_columns parameters from earlier V2 betas are rejected with a per-key error naming the lookup_columns.<key> replacement. Update existing callers to the nested shape.

Supported id_types:

Person entities:

  • "email" - Email addresses with automatic normalization
  • "phone" - Phone numbers (E.164 format recommended)
  • "address" - Physical addresses (libpostal-parsed component matching with apartment/unit resolution)
  • "maid" - Mobile advertising IDs (IDFA for iOS, GAID for Android)
  • "name" - Person names (first/last/full)
  • "social:linkedin" - LinkedIn profile, passed as either a bare slug (e.g. john-doe-070215) or full URL (e.g. https://www.linkedin.com/in/john-doe-070215/). Scheme, www., trailing slashes, and path suffixes like /details/experience are stripped automatically.

Business entities:

  • "name" - Company names
  • "phone" - Business phone numbers
  • "address" - Business addresses (same parsing as person addresses)
  • "website" - Company website or domain (e.g. https://example.com or example.com)
  • "social:linkedin" - LinkedIn company page, passed as either a bare slug (e.g. tennis-en-padel-shop-noord) or full URL (e.g. https://www.linkedin.com/company/tennis-en-padel-shop-noord/). Scheme, www., trailing slashes, and /about-style path suffixes are stripped automatically. Additional networks (social:<network>) may be added in the future.

Supported hash_types:

  • "plaintext" - Unhashed values
  • "md5" - MD5 hash
  • "sha1" - SHA-1 hash
  • "sha256" - SHA-256 hash

Example identifiers:

{
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    }
  ]
}

format:

  • When set to csv, json, or jsonl, generates S3 presigned download URL
  • URL expires in 1 hour
  • Returns export metadata in response

identifier_types:

  • Array of contact types to return in the identifiers field
  • Allowed values depend on entity_type:
    • "person""name", "email", "phone", "address", "maid", "social:linkedin"
    • "business""name", "phone", "address", "social:linkedin", "website"
  • Defaults: person → ["email"], business → ["name"]
  • Values outside the set for the chosen entity_type are rejected
  • Returns actual stored contact data from the resolved entity profiles
  • Eliminates need for follow-up entity_enrich call to retrieve contact info

workflow_id:

  • Optional UUID for tracking related tool calls in a session
  • If not provided, a new workflow_id is generated
  • Used for deterministic sampling and feedback correlation

Request Schema:

interface EntityResolveParams {
  entity_type: "person" | "business";
  identifiers?: Array<{
    // person: "name" | "email" | "phone" | "address" | "maid" | "social:linkedin"
    // business: "name" | "phone" | "address" | "website" | "social:linkedin"
    id_type: string;
    hash_type: "plaintext" | "md5" | "sha1" | "sha256";
    values: string[];
  }>;
  csv_resource_uri?: string;
  lookup_columns?: {
    email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    address?: { names: string[] };
    name?: { names: string[] };
    linkedin?: { names: string[] };
    domain?: { names: string[] };
  };
  offset?: number;
  limit?: number;
  format?: "none" | "csv" | "json" | "jsonl";
  // person: Array<"name" | "email" | "phone" | "address" | "maid" | "social:linkedin">
  // business: Array<"name" | "phone" | "address" | "social:linkedin" | "website">
  identifier_types?: string[];
  workflow_id?: string;
}

Output Format

Success Response:

{
  entities: Array<{
    entity_id: number;
    overall_quality_score: number;
    matches: Array<{
      criterion_type: string;
      criterion_value: string;
      quality_score: number;
    }>;
    identifiers: {
      [type: string]: string[];
    };
    address?: {
      normalized_key: string;
      // latitude, longitude, distance_meters are not returned in V2 responses
    };
  }>,
  stats: {
    requested: number,
    resolved: number,
    rate: number,
    resolved_by_type: Record<string, number>
  },
  export?: {
    url: string;
    format: "csv" | "json" | "jsonl";
    rows: number;
    size_bytes: number;
    expires_at: string;
    resource_uri: string;
  },
  warnings?: Array<{ code: string; message: string }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
entitiesarrayArray of resolved entities grouped by entity_id
entities[].entity_idnumberEntity ID
entities[].overall_quality_scorenumberNoisy-OR aggregated confidence (0-1) across all matches
entities[].matchesarrayIndividual criterion matches with per-criterion scores
entities[].matches[].criterion_typestringType (e.g., "email_plaintext", "phone_md5")
entities[].matches[].criterion_valuestringThe matched value
entities[].matches[].quality_scorenumberQuality score for this specific match (0-1)
entities[].identifiersobjectStored contact data, keyed by type
entities[].addressobjectAddress match data (only for address queries). Contains normalized_key; geo coordinates are not returned in V2 responses.
stats.requestednumberTotal identifier values provided across all groups
stats.resolvednumberDistinct entities matched. rate = resolved / requested is bounded to [0, 1].
stats.ratenumberDistinct entities resolved per identifier requested
stats.resolved_by_typeobjectDistinct entities matched per identifier type (e.g. {"email": 171, "address": 226}). Each entity contributes at most 1 per type bucket regardless of how many criteria of that type matched it.
exportobjectExport metadata (only when format is csv/json/jsonl)
export.urlstringPresigned S3 download URL (expires in 1 hour)
export.resource_uristringWorkflow resource URI for the exported file
warningsarrayOptional. Non-fatal warnings raised during the run. CSV-mode resolution emits address_parse_low_yield when most address values failed libpostal parsing — typically a sign that the columns under lookup_columns.address.names were listed in the wrong order, or that a single mapped column contains only fragments without a postcode
tool_trace_idstringOpenTelemetry trace ID for this tool execution
workflow_idstringWorkflow session identifier

Example Response (Email Resolution):

{
  "entities": [
    {
      "entity_id": 123456,
      "overall_quality_score": 0.95,
      "matches": [
        {
          "criterion_type": "email_plaintext",
          "criterion_value": "john.doe@example.com",
          "quality_score": 0.95
        }
      ],
      "identifiers": {
        "email": ["john.doe@example.com", "jdoe@work.com"]
      }
    }
  ],
  "stats": {
    "requested": 2,
    "resolved": 1,
    "rate": 0.5,
    "resolved_by_type": { "email": 1 }
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example Response (Address Resolution with Key-Based Matching):

{
  "entities": [
    {
      "entity_id": 789012,
      "overall_quality_score": 0.88,
      "matches": [
        {
          "criterion_type": "address_plaintext",
          "criterion_value": "123 Main St, San Francisco, CA 94105",
          "quality_score": 0.88
        }
      ],
      "identifiers": {
        "email": ["resident@example.com"]
      },
      "address": {
        "normalized_key": "123 main st san francisco ca 94105 usa"
      }
    }
  ],
  "stats": {
    "requested": 1,
    "resolved": 1,
    "rate": 1.0,
    "resolved_by_type": { "address": 1 }
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

  • Both identifiers and csv_resource_uri provided: "identifiers and csv_resource_uri are mutually exclusive. Provide one or the other."
  • Neither provided: "Either identifiers or csv_resource_uri must be provided."
  • csv_resource_uri without column mappings: "When using csv_resource_uri, lookup_columns must specify at least one identifier type (email, phone, address, name, linkedin, or domain) with non-empty names."
  • Address identifier with non-plaintext hash_type: "Address identifiers require hash_type 'plaintext' — address parsing cannot use hashed values"
  • Social identifier (social:linkedin, etc.) with non-plaintext hash_type: "Social identifiers require hash_type 'plaintext' — slug normalization cannot use hashed values"
  • Identifier type not valid for the chosen entity_type (e.g., maid for a business): "Identifier types not allowed for entity_type='business'. Allowed: name, phone, address, social:linkedin, website. Violations: identifiers[0].id_type='maid'."
  • More than 50 identifier groups in identifiers: "Maximum 50 identifier groups allowed."
  • More than 3,000 values in any identifier group: "Maximum 3000 values per identifier group."
  • Service temporarily unavailable: "Failed to resolve entities. Please try again or contact support if the issue persists." Carries a structured details payload with cause, hint, and workflow_id so on-call can correlate with ClickStack — see the RESOLVE_ERROR_CAUSES enum in lib/util/classifyResolveError.ts for the bounded cause set.

For files larger than 200,000 rows, paginate using the next_offset cursor returned in the response: pass it back as offset on the next call until the field is omitted (last page). offset, limit, and next_offset only apply on the CSV path; they are ignored when inline identifiers is used.

Address Matching Behavior

  • Addresses are parsed using libpostal into normalized components (street, city, state, zip, unit)
  • Matching is performed at both the street level (address_plaintext) and unit level (address_unit_plaintext)
  • When an input address has a unit AND the unit lookup matches at least one entity (i.e., the building has unit-precision data), entities matched for that input via the street criterion only — without a unit match for the same input — are dropped. This prevents the street-level fallback from returning the whole building when a more specific unit match is available.
  • For unit-bearing inputs whose unit lookup returns nothing (no unit-precision data exists for the building), the street fallback is preserved on every matched entity, with a 0.6x penalty applied to its quality score as a signal that unit precision could not be established.
  • Returns only the best-scoring entity(s) per input address
  • Household members tied at max score are all returned

CSV-mode parse-null warning: when csv_resource_uri is used with lookup_columns.address and most address values fail libpostal parsing (over half of the unique address inputs), the response includes a warnings[] entry with code address_parse_low_yield. This usually means the columns were listed in the wrong order, or that a single mapped column contains only fragments without a postcode.

List multi-column address mappings street-first — address1, address2?, city?, region?, postcode, country? — so libpostal sees a canonical address string.

The warning only fires on severe failures (>50% null parse rate). Silently depressed match rates from ordering mistakes that still parse won't trip it, so verify column order whenever the match rate is below expectation.

Usage Examples

Example 1: Simple email resolution

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    }
  ]
}

Example 2: Multi-criterion (email + phone)

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    }
  ]
}

Example 3: CSV resource input

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": {
    "email": { "names": ["email"] },
    "phone": { "names": ["phone"] }
  }
}

Example 3b: CSV with pre-hashed identifiers

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": {
    "email": { "names": ["email_md5"], "hash_type": "md5" },
    "phone": { "names": ["phone_sha256"], "hash_type": "sha256" }
  }
}

Example 4: Hashed identifiers with export

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "md5",
      "values": ["5d41402abc4b2a76b9719d911017c592"]
    }
  ],
  "format": "csv"
}

Example 5: Request specific identifier types

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    }
  ],
  "identifier_types": ["email", "phone", "name"]
}

Example 6a: Resolve a person by LinkedIn profile

Full profile URL and bare slug both normalize to the same lookup key:

{
  "entity_type": "person",
  "identifiers": [
    {
      "id_type": "social:linkedin",
      "hash_type": "plaintext",
      "values": ["https://www.linkedin.com/in/john-doe-070215/"]
    }
  ],
  "identifier_types": ["email", "phone", "social:linkedin"]
}

Example 6b: Resolve a business by LinkedIn company page

Bare slug and full URL both normalize to the same match, so the following two requests are equivalent:

{
  "entity_type": "business",
  "identifiers": [
    {
      "id_type": "social:linkedin",
      "hash_type": "plaintext",
      "values": ["tennis-en-padel-shop-noord"]
    }
  ],
  "identifier_types": ["name", "website", "social:linkedin"]
}
{
  "entity_type": "business",
  "identifiers": [
    {
      "id_type": "social:linkedin",
      "hash_type": "plaintext",
      "values": ["https://www.linkedin.com/company/tennis-en-padel-shop-noord/"]
    }
  ],
  "identifier_types": ["name", "website", "social:linkedin"]
}

Example 7: Resolve a business by website domain

{
  "entity_type": "business",
  "identifiers": [
    {
      "id_type": "website",
      "hash_type": "plaintext",
      "values": ["example.com"]
    }
  ],
  "identifier_types": ["name", "website"]
}

On this page