resolve_identities

Description: Resolve person identities by matching emails, phones, addresses, or mobile advertising IDs (MAIDs). Supports querying multiple identifier types in a single call with Noisy-OR quality score aggregation. Returns person IDs grouped by individual with quality scores. Email addresses are automatically normalized (Gmail dots and plus-addressing removed). Phone numbers normalized to E.164 format.

Tool Identifier: resolve_identities

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
multi_identifiers	array	Yes	-	Max 50 groups per request; each group's `values` array is capped at 3,000 entries — use `csv_resource_uri` for larger inputs	Query multiple identifier types simultaneously
format	string	No	"none"	"none", "csv", "json", "jsonl"	Export format - generates presigned S3 URL valid for 1 hour
identifier_types	array	No	["email"]	"name", "email", "phone", "address", "maid"	Contact types to return in the identifiers field from resolved person profiles
workflow_id	string	No	-	Valid UUID	Workflow session identifier for correlation

Parameter Details:

multi_identifiers:

Array of objects, each specifying id_type, hash_type, and values[]
Allows querying across different identifier types in one call
Email/phone/maid can be mixed in a single call
Address must be in a separate call (uses geospatial H3 matching)
Returns Noisy-OR aggregated overall_quality_score per person
Capped at 50 identifier groups per request — split larger inputs into multiple calls
Each identifier group's values array is capped at 3,000 entries — for larger inputs use csv_resource_uri (governed by a separate 200,000-row cap)

Supported id_types:

"email" - Email addresses with automatic normalization
"phone" - Phone numbers (E.164 format recommended)
"address" - Physical addresses (geospatial matching via H3 resolution 11, ~28m precision)
"maid" - Mobile advertising IDs (IDFA for iOS, GAID for Android)

Supported hash_types:

"plaintext" - Unhashed values
"md5" - MD5 hash
"sha1" - SHA-1 hash
"sha256" - SHA-256 hash

Example multi_identifiers:

{
  "multi_identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    },
    {
      "id_type": "maid",
      "hash_type": "sha256",
      "values": ["abc123..."]
    }
  ]
}

format:

When set to csv, json, or jsonl, generates S3 presigned download URL
URL expires in 1 hour
Returns export metadata in response

identifier_types:

Array of contact types to return in the identifiers field
Valid types: "name", "email", "phone", "address", "maid"
Default: ["email"]
Returns actual stored contact data from the resolved person profiles
Eliminates need for follow-up get_person call to retrieve contact info
The identifiers field may be an empty object {} if the person has no stored contacts matching the requested types

workflow_id:

Optional UUID for tracking related tool calls in a session
If not provided, a new workflow_id is generated
Used for deterministic sampling and feedback correlation

Request Schema:

interface ResolveIdentitiesParams {
  multi_identifiers: Array<{
    id_type: "email" | "phone" | "address" | "maid";
    hash_type: "plaintext" | "md5" | "sha1" | "sha256";
    values: string[];
  }>;
  format?: "none" | "csv" | "json" | "jsonl";
  identifier_types?: Array<"name" | "email" | "phone" | "address" | "maid">;
  workflow_id?: string;
}

Output Format

Success Response:

{
  identities: Array<{
    person_id: number;
    overall_quality_score: number;
    matches: Array<{
      criterion_type: string;
      criterion_value: string;
      quality_score: number;
    }>;
    identifiers: {
      [type: string]: string[];  // e.g., { email: ["a@example.com"], phone: ["+15551234567"] }
    };
    address?: {
      normalized_address: string;
      latitude: number;
      longitude: number;
      distance_meters: number;
    };
  }>,
  stats: {
    requested: number,
    resolved: number,
    rate: number
  },
  export?: {
    url: string;
    format: "csv" | "json" | "jsonl";
    rows: number;
    size_bytes: number;
    expires_at: string;
  },
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

Field	Type	Description
identities	array	Array of resolved identities grouped by person_id
identities[].person_id	number	Person ID
identities[].overall_quality_score	number	Noisy-OR aggregated confidence (0-1) across all matches
identities[].matches	array	Individual criterion matches with per-criterion scores
identities[].matches[].criterion_type	string	Type (e.g., "email_plaintext", "phone_md5", "maid_sha256")
identities[].matches[].criterion_value	string	The matched value
identities[].matches[].quality_score	number	Quality score for this specific match (0-1)
identities[].identifiers	object	Stored contact data from person profile, keyed by type (e.g., email, phone). Returns types specified in identifier_types parameter.
identities[].address	object	Geocoding data with distance (only for address queries)
identities[].address.normalized_address	string	Normalized address string
identities[].address.latitude	number	Latitude coordinate
identities[].address.longitude	number	Longitude coordinate
identities[].address.distance_meters	number	Distance from query address in meters
stats.requested	number	Total identifier values provided across all groups
stats.resolved	number	Distinct identities matched. `rate = resolved / requested` is bounded to `[0, 1]`.
stats.rate	number	Distinct identities resolved per identifier requested
stats.resolved_by_type	object	Distinct identities matched per identifier type (e.g. `{"email": 171, "address": 226}`). Each identity contributes at most 1 per type bucket regardless of how many criteria of that type matched it.
export	object	Export metadata (only when format is csv/json/jsonl)
export.url	string	Presigned S3 download URL (expires in 1 hour)
export.format	string	Export format
export.rows	number	Number of rows in export
export.size_bytes	number	File size in bytes
export.expires_at	string	ISO 8601 expiration timestamp
tool_trace_id	string	OpenTelemetry trace ID for this tool execution
workflow_id	string	Workflow session identifier

Example Response (Email Resolution):

{
  "identities": [
    {
      "person_id": 123456,
      "overall_quality_score": 0.95,
      "matches": [
        {
          "criterion_type": "email_plaintext",
          "criterion_value": "john.doe@example.com",
          "quality_score": 0.95
        }
      ],
      "identifiers": {
        "email": ["john.doe@example.com", "jdoe@work.com"]
      }
    }
  ],
  "stats": {
    "requested": 2,
    "resolved": 1,
    "rate": 0.5
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example Response (Multi-Criterion with Export):

{
  "identities": [],
  "stats": {
    "requested": 3,
    "resolved": 2,
    "rate": 0.67
  },
  "export": {
    "url": "https://s3.amazonaws.com/bucket/file.csv?...",
    "format": "csv",
    "rows": 2,
    "size_bytes": 1024,
    "expires_at": "2025-01-16T12:00:00Z"
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example Response (Address Resolution with Distance):

{
  "identities": [
    {
      "person_id": 789012,
      "overall_quality_score": 0.88,
      "matches": [
        {
          "criterion_type": "address_h3_11",
          "criterion_value": "123 Main St, San Francisco, CA 94105",
          "quality_score": 0.88
        }
      ],
      "identifiers": {
        "email": ["resident@example.com"]
      },
      "address": {
        "normalized_address": "123 Main St, San Francisco, CA 94105, USA",
        "latitude": 37.7749,
        "longitude": -122.4194,
        "distance_meters": 15.3
      }
    }
  ],
  "stats": {
    "requested": 1,
    "resolved": 1,
    "rate": 1.0
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Error Response Format:

{
  "content": [
    {
      "type": "text",
      "text": "Identity resolution failed: <error message>"
    }
  ],
  "isError": true
}

Common Errors:

Empty identifiers array: "At least one identifier required"
Invalid identifier format: "Invalid email/phone/address format"
More than 3,000 identifiers in a single request: "Maximum 3000 identifiers allowed"
More than 50 identifier groups in multi_identifiers: "Maximum 50 identifier groups allowed"
More than 3,000 values in any identifier group: "Maximum 3000 values per identifier group"
Service temporarily unavailable: "Request failed - please try again"
Request timeout: "Request took too long - try reducing batch size"

When csv_resource_uri points to a file with more than 200,000 rows, paginate using the next_offset cursor returned in the response: pass it back as offset on the next call until the field is omitted (last page). offset, limit, and next_offset only apply on the CSV path; they are ignored when inline identifiers / multi_identifiers is used.

Address Matching Behavior

Returns only the best-scoring person(s) per input address (not all people in the H3 cell)
Minimum quality threshold of 0.75 (~31m distance) excludes weak matches
Household members tied at max score are all returned
Maximum batch size: 1,000 addresses per request
Quality scores use distance-based decay: full score within 5m, linear decay to 10% at 100m, floor at 10% beyond

Performance Notes

Supports batch processing of multiple identifiers in a single request
Returns quality scores for each resolved identity
Results ordered by quality score (highest quality first)
No hard limit on number of identifiers, but very large batches may timeout

Usage Examples

Example 1: Simple email resolution

{
  "multi_identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com", "bob@example.com"]
    }
  ]
}

Example 2: Phone number resolution

{
  "multi_identifiers": [
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567", "+442071234567"]
    }
  ]
}

Example 3: MAID resolution with hashing

{
  "multi_identifiers": [
    {
      "id_type": "maid",
      "hash_type": "sha256",
      "values": [
        "a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3"
      ]
    }
  ]
}

Example 4: Multi-criterion (email + phone + maid)

{
  "multi_identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    },
    {
      "id_type": "phone",
      "hash_type": "plaintext",
      "values": ["+15551234567"]
    },
    {
      "id_type": "maid",
      "hash_type": "md5",
      "values": ["098f6bcd4621d373cade4e832627b4f6"]
    }
  ]
}

Example 5: Address resolution with distance

{
  "multi_identifiers": [
    {
      "id_type": "address",
      "hash_type": "plaintext",
      "values": ["123 Main St, San Francisco, CA 94105"]
    }
  ]
}

Example 6: Request specific identifier types

{
  "multi_identifiers": [
    {
      "id_type": "email",
      "hash_type": "plaintext",
      "values": ["alice@example.com"]
    }
  ],
  "identifier_types": ["email", "phone"]
}

Example 7: Export to CSV

{
  "multi_identifiers": [
    {
      "id_type": "email",
      "hash_type": "md5",
      "values": [
        "5d41402abc4b2a76b9719d911017c592",
        "098f6bcd4621d373cade4e832627b4f6"
      ]
    }
  ],
  "format": "csv"
}