Resolve entity identities by matching emails, phones, addresses, MAIDs, websites, or social handles. Supports multi-criterion queries with Noisy-OR quality score aggregation. Returns entity IDs grouped by individual with quality scores.
Quick Example
{
"entity_type": "person",
"identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": ["alice@example.com", "bob@example.com"]
}
]
}Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| entity_type | string | Yes | - | "person" or "business" | Type of entity to resolve |
| identifiers | array | Conditional | - | Max 50 groups per request; each group's values array is capped at 3,000 entries — use csv_resource_uri for larger inputs | Multi-criterion identifiers. Mutually exclusive with csv_resource_uri |
| csv_resource_uri | string | Conditional | - | workflow:// URI | CSV file with identifiers. Mutually exclusive with identifiers |
| lookup_columns | object | Conditional | - | See below | Column-mapping for CSV-based resolution. Required when csv_resource_uri is set; at least one sub-key (email, phone, address, name, linkedin, domain) must have non-empty names |
| offset | number | No | 0 | Integer ≥ 0 | Number of CSV data rows to skip before reading. Use with limit to paginate large CSVs across multiple calls. Only applies to csv_resource_uri; ignored when identifiers is used |
| limit | number | No | 200000 | 1 ≤ limit ≤ 200000 | Maximum number of CSV data rows to read in this call. Only applies to csv_resource_uri; ignored when identifiers is used |
| format | string | No | "none" | "none", "csv", "json", "jsonl" | Export format - generates presigned S3 URL valid for 1 hour |
| identifier_types | array | No | person → ["email"], business → ["name"] | person: "name", "email", "phone", "address", "maid", "social:linkedin" — business: "name", "phone", "address", "social:linkedin", "website" | Contact types to return in identifiers field (allowed values depend on entity_type) |
| workflow_id | string | No | - | Valid UUID | Workflow session identifier for correlation |
Parameter Details:
entity_type:
- Required. Use
"person"for individual identities or"business"for company entities.
identifiers:
- Array of objects, each specifying
id_type,hash_type, andvalues[] - Allows querying across different identifier types in one call
- Email/phone/maid can be mixed in a single call
- Address identifiers can also be included alongside other types
- Returns Noisy-OR aggregated
overall_quality_scoreper entity - Capped at 50 identifier groups per request — split larger inputs into multiple calls
- Each identifier group's
valuesarray is capped at 3,000 entries — for larger inputs usecsv_resource_uri(governed by a separate 200,000-row cap) - Mutually exclusive with csv_resource_uri
csv_resource_uri:
- Workflow resource URI pointing to a CSV file (e.g.,
workflow://{workflow_id}/uploads/customers.csv) - Requires
lookup_columnswith at least one identifier type populated - The CSV is processed in pages of up to 200,000 rows per call. When more rows remain, the response includes a
next_offsetfield — pass it back asoffseton the next call. The field is omitted on the last page. - Mutually exclusive with identifiers
lookup_columns (CSV mode):
Maps CSV columns to identifier types. The same shape is used by resolve_and_enrich_rows — see Conventions → CSV Column Mapping for the canonical reference, including per-identifier rules, multi-column address joining, and the address_parse_low_yield warning.
{
email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
address?: { names: string[] },
name?: { names: string[] },
linkedin?: { names: string[] }, // resolves via the `social:linkedin` identifier type
domain?: { names: string[] } // business entities — resolves via the `website` identifier type
}At least one sub-key with non-empty names is required. Only email and phone accept hash_type; the other types require plaintext values. When address.names lists more than one column, per-row cell values are concatenated in listed order with ", " before libpostal parsing — list them street-first (address1, address2?, city?, region?, postcode, country?).
Migration from legacy *_columns keys: The flat email_columns, phone_columns, and address_columns parameters from earlier V2 betas are rejected with a per-key error naming the lookup_columns.<key> replacement. Update existing callers to the nested shape.
Supported id_types:
Person entities:
"email"- Email addresses with automatic normalization"phone"- Phone numbers (E.164 format recommended)"address"- Physical addresses (libpostal-parsed component matching with apartment/unit resolution)"maid"- Mobile advertising IDs (IDFA for iOS, GAID for Android)"name"- Person names (first/last/full)"social:linkedin"- LinkedIn profile, passed as either a bare slug (e.g.john-doe-070215) or full URL (e.g.https://www.linkedin.com/in/john-doe-070215/). Scheme,www., trailing slashes, and path suffixes like/details/experienceare stripped automatically.
Business entities:
"name"- Company names"phone"- Business phone numbers"address"- Business addresses (same parsing as person addresses)"website"- Company website or domain (e.g.https://example.comorexample.com)"social:linkedin"- LinkedIn company page, passed as either a bare slug (e.g.tennis-en-padel-shop-noord) or full URL (e.g.https://www.linkedin.com/company/tennis-en-padel-shop-noord/). Scheme,www., trailing slashes, and/about-style path suffixes are stripped automatically. Additional networks (social:<network>) may be added in the future.
Supported hash_types:
"plaintext"- Unhashed values"md5"- MD5 hash"sha1"- SHA-1 hash"sha256"- SHA-256 hash
Example identifiers:
{
"identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": ["alice@example.com", "bob@example.com"]
},
{
"id_type": "phone",
"hash_type": "plaintext",
"values": ["+15551234567"]
}
]
}format:
- When set to
csv,json, orjsonl, generates S3 presigned download URL - URL expires in 1 hour
- Returns export metadata in response
identifier_types:
- Array of contact types to return in the
identifiersfield - Allowed values depend on
entity_type:"person"→"name","email","phone","address","maid","social:linkedin""business"→"name","phone","address","social:linkedin","website"
- Defaults: person →
["email"], business →["name"] - Values outside the set for the chosen
entity_typeare rejected - Returns actual stored contact data from the resolved entity profiles
- Eliminates need for follow-up
entity_enrichcall to retrieve contact info
workflow_id:
- Optional UUID for tracking related tool calls in a session
- If not provided, a new workflow_id is generated
- Used for deterministic sampling and feedback correlation
Request Schema:
interface EntityResolveParams {
entity_type: "person" | "business";
identifiers?: Array<{
// person: "name" | "email" | "phone" | "address" | "maid" | "social:linkedin"
// business: "name" | "phone" | "address" | "website" | "social:linkedin"
id_type: string;
hash_type: "plaintext" | "md5" | "sha1" | "sha256";
values: string[];
}>;
csv_resource_uri?: string;
lookup_columns?: {
email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
address?: { names: string[] };
name?: { names: string[] };
linkedin?: { names: string[] };
domain?: { names: string[] };
};
offset?: number;
limit?: number;
format?: "none" | "csv" | "json" | "jsonl";
// person: Array<"name" | "email" | "phone" | "address" | "maid" | "social:linkedin">
// business: Array<"name" | "phone" | "address" | "social:linkedin" | "website">
identifier_types?: string[];
workflow_id?: string;
}Output Format
Success Response:
{
entities: Array<{
entity_id: number;
overall_quality_score: number;
matches: Array<{
criterion_type: string;
criterion_value: string;
quality_score: number;
}>;
identifiers: {
[type: string]: string[];
};
address?: {
normalized_key: string;
// latitude, longitude, distance_meters are not returned in V2 responses
};
}>,
stats: {
requested: number,
resolved: number,
rate: number,
resolved_by_type: Record<string, number>
},
export?: {
url: string;
format: "csv" | "json" | "jsonl";
rows: number;
size_bytes: number;
expires_at: string;
resource_uri: string;
},
warnings?: Array<{ code: string; message: string }>,
tool_trace_id: string,
workflow_id: string
}Response Fields:
| Field | Type | Description |
|---|---|---|
| entities | array | Array of resolved entities grouped by entity_id |
| entities[].entity_id | number | Entity ID |
| entities[].overall_quality_score | number | Noisy-OR aggregated confidence (0-1) across all matches |
| entities[].matches | array | Individual criterion matches with per-criterion scores |
| entities[].matches[].criterion_type | string | Type (e.g., "email_plaintext", "phone_md5") |
| entities[].matches[].criterion_value | string | The matched value |
| entities[].matches[].quality_score | number | Quality score for this specific match (0-1) |
| entities[].identifiers | object | Stored contact data, keyed by type |
| entities[].address | object | Address match data (only for address queries). Contains normalized_key; geo coordinates are not returned in V2 responses. |
| stats.requested | number | Total identifier values provided across all groups |
| stats.resolved | number | Distinct entities matched. rate = resolved / requested is bounded to [0, 1]. |
| stats.rate | number | Distinct entities resolved per identifier requested |
| stats.resolved_by_type | object | Distinct entities matched per identifier type (e.g. {"email": 171, "address": 226}). Each entity contributes at most 1 per type bucket regardless of how many criteria of that type matched it. |
| export | object | Export metadata (only when format is csv/json/jsonl) |
| export.url | string | Presigned S3 download URL (expires in 1 hour) |
| export.resource_uri | string | Workflow resource URI for the exported file |
| warnings | array | Optional. Non-fatal warnings raised during the run. CSV-mode resolution emits address_parse_low_yield when most address values failed libpostal parsing — typically a sign that the columns under lookup_columns.address.names were listed in the wrong order, or that a single mapped column contains only fragments without a postcode |
| tool_trace_id | string | OpenTelemetry trace ID for this tool execution |
| workflow_id | string | Workflow session identifier |
Example Response (Email Resolution):
{
"entities": [
{
"entity_id": 123456,
"overall_quality_score": 0.95,
"matches": [
{
"criterion_type": "email_plaintext",
"criterion_value": "john.doe@example.com",
"quality_score": 0.95
}
],
"identifiers": {
"email": ["john.doe@example.com", "jdoe@work.com"]
}
}
],
"stats": {
"requested": 2,
"resolved": 1,
"rate": 0.5,
"resolved_by_type": { "email": 1 }
},
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example Response (Address Resolution with Key-Based Matching):
{
"entities": [
{
"entity_id": 789012,
"overall_quality_score": 0.88,
"matches": [
{
"criterion_type": "address_plaintext",
"criterion_value": "123 Main St, San Francisco, CA 94105",
"quality_score": 0.88
}
],
"identifiers": {
"email": ["resident@example.com"]
},
"address": {
"normalized_key": "123 main st san francisco ca 94105 usa"
}
}
],
"stats": {
"requested": 1,
"resolved": 1,
"rate": 1.0,
"resolved_by_type": { "address": 1 }
},
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Error Handling
Common Errors:
- Both identifiers and csv_resource_uri provided: "identifiers and csv_resource_uri are mutually exclusive. Provide one or the other."
- Neither provided: "Either identifiers or csv_resource_uri must be provided."
- csv_resource_uri without column mappings: "When using csv_resource_uri, lookup_columns must specify at least one identifier type (email, phone, address, name, linkedin, or domain) with non-empty names."
- Address identifier with non-plaintext hash_type: "Address identifiers require hash_type 'plaintext' — address parsing cannot use hashed values"
- Social identifier (social:linkedin, etc.) with non-plaintext hash_type: "Social identifiers require hash_type 'plaintext' — slug normalization cannot use hashed values"
- Identifier type not valid for the chosen entity_type (e.g., maid for a business): "Identifier types not allowed for entity_type='business'. Allowed: name, phone, address, social:linkedin, website. Violations: identifiers[0].id_type='maid'."
- More than 50 identifier groups in
identifiers: "Maximum 50 identifier groups allowed." - More than 3,000 values in any identifier group: "Maximum 3000 values per identifier group."
- Service temporarily unavailable: "Failed to resolve entities. Please try again or contact support if the issue persists." Carries a structured
detailspayload withcause,hint, andworkflow_idso on-call can correlate with ClickStack — see theRESOLVE_ERROR_CAUSESenum inlib/util/classifyResolveError.tsfor the bounded cause set.
For files larger than 200,000 rows, paginate using the next_offset cursor returned in the response: pass it back as offset on the next call until the field is omitted (last page). offset, limit, and next_offset only apply on the CSV path; they are ignored when inline identifiers is used.
Address Matching Behavior
- Addresses are parsed using libpostal into normalized components (street, city, state, zip, unit)
- Matching is performed at both the street level (
address_plaintext) and unit level (address_unit_plaintext) - When an input address has a unit AND the unit lookup matches at least one entity (i.e., the building has unit-precision data), entities matched for that input via the street criterion only — without a unit match for the same input — are dropped. This prevents the street-level fallback from returning the whole building when a more specific unit match is available.
- For unit-bearing inputs whose unit lookup returns nothing (no unit-precision data exists for the building), the street fallback is preserved on every matched entity, with a 0.6x penalty applied to its quality score as a signal that unit precision could not be established.
- Returns only the best-scoring entity(s) per input address
- Household members tied at max score are all returned
CSV-mode parse-null warning: when csv_resource_uri is used with lookup_columns.address and most address values fail libpostal parsing (over half of the unique address inputs), the response includes a warnings[] entry with code address_parse_low_yield. This usually means the columns were listed in the wrong order, or that a single mapped column contains only fragments without a postcode.
List multi-column address mappings street-first — address1, address2?, city?, region?, postcode, country? — so libpostal sees a canonical address string.
The warning only fires on severe failures (>50% null parse rate). Silently depressed match rates from ordering mistakes that still parse won't trip it, so verify column order whenever the match rate is below expectation.
Usage Examples
Example 1: Simple email resolution
{
"entity_type": "person",
"identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": ["alice@example.com", "bob@example.com"]
}
]
}Example 2: Multi-criterion (email + phone)
{
"entity_type": "person",
"identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": ["alice@example.com"]
},
{
"id_type": "phone",
"hash_type": "plaintext",
"values": ["+15551234567"]
}
]
}Example 3: CSV resource input
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
"lookup_columns": {
"email": { "names": ["email"] },
"phone": { "names": ["phone"] }
}
}Example 3b: CSV with pre-hashed identifiers
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
"lookup_columns": {
"email": { "names": ["email_md5"], "hash_type": "md5" },
"phone": { "names": ["phone_sha256"], "hash_type": "sha256" }
}
}Example 4: Hashed identifiers with export
{
"entity_type": "person",
"identifiers": [
{
"id_type": "email",
"hash_type": "md5",
"values": ["5d41402abc4b2a76b9719d911017c592"]
}
],
"format": "csv"
}Example 5: Request specific identifier types
{
"entity_type": "person",
"identifiers": [
{
"id_type": "email",
"hash_type": "plaintext",
"values": ["alice@example.com"]
}
],
"identifier_types": ["email", "phone", "name"]
}Example 6a: Resolve a person by LinkedIn profile
Full profile URL and bare slug both normalize to the same lookup key:
{
"entity_type": "person",
"identifiers": [
{
"id_type": "social:linkedin",
"hash_type": "plaintext",
"values": ["https://www.linkedin.com/in/john-doe-070215/"]
}
],
"identifier_types": ["email", "phone", "social:linkedin"]
}Example 6b: Resolve a business by LinkedIn company page
Bare slug and full URL both normalize to the same match, so the following two requests are equivalent:
{
"entity_type": "business",
"identifiers": [
{
"id_type": "social:linkedin",
"hash_type": "plaintext",
"values": ["tennis-en-padel-shop-noord"]
}
],
"identifier_types": ["name", "website", "social:linkedin"]
}{
"entity_type": "business",
"identifiers": [
{
"id_type": "social:linkedin",
"hash_type": "plaintext",
"values": ["https://www.linkedin.com/company/tennis-en-padel-shop-noord/"]
}
],
"identifier_types": ["name", "website", "social:linkedin"]
}Example 7: Resolve a business by website domain
{
"entity_type": "business",
"identifiers": [
{
"id_type": "website",
"hash_type": "plaintext",
"values": ["example.com"]
}
],
"identifier_types": ["name", "website"]
}