Resolve each row of a CSV file to a platform entity ID and optionally append enrichment data, preserving the original row structure.
Quick Example
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
"lookup_columns": { "email": { "names": ["email"] } },
"domains": ["demographic"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| entity_type | string | Yes | - | "person" or "business" | Type of entity to resolve |
| csv_key | string | Conditional | - | Filename | CSV filename from generate_upload_url. Mutually exclusive with csv_resource_uri |
| csv_resource_uri | string | Conditional | - | workflow:// URI | CSV resource URI. Mutually exclusive with csv_key |
| lookup_columns | object | Conditional | - | See below | Column-mapping for CSV-based resolution. Groups each identifier type's column names (and hash_type for email/phone) under one object. At least one sub-key with non-empty names is required |
| tiebreaker_hierarchy | array | No | ["email", "phone", "address", "linkedin", "domain", "name"] | Ordered array | Priority for divergent identifier resolution |
| min_score_threshold | number | No | 0.0 | 0.0-1.0 | Minimum match score threshold |
| domains | array | Yes | - | Enrichment domains | Enrichment domains to include in output |
| include_unmatched | boolean | No | true | - | Include rows with no matches in output |
| include_score_breakdown | boolean | No | false | - | Include detailed score breakdown per identifier |
| workflow_id | string | Yes | - | Valid UUID | Workflow session ID from generate_upload_url |
| offset | number | No | 0 | Integer ≥ 0 | Number of CSV data rows to skip before processing. Use with limit to paginate large CSVs across multiple calls |
| limit | number | No | 40000 | 1 ≤ limit ≤ 40000 | Maximum number of CSV data rows to process in this call. To process larger files, call repeatedly with increasing offset values |
Parameter Details:
csv_key vs csv_resource_uri:
- Provide exactly one. They are mutually exclusive.
csv_resource_urimust point to theuploadsorartifactssub-path of a workflow URI (e.g.workflow://{workflow_id}/uploads/file.csvorworkflow://{workflow_id}/artifacts/file.csv). Other sub-paths are rejected.- At least one identifier column must be specified via
lookup_columns. - The CSV is processed in pages of up to 40,000 rows per call. When more rows remain, the response includes a
next_offsetfield — pass it back asoffseton the next call. The field is omitted on the last page.
lookup_columns:
Maps CSV columns to identifier types. The same shape is used by entity_resolve — see Conventions → CSV Column Mapping for the canonical reference, including per-identifier rules, multi-column address joining, and the address_parse_low_yield warning.
{
email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
address?: { names: string[] },
name?: { names: string[] },
linkedin?: { names: string[] }, // resolves via the `social:linkedin` identifier type
domain?: { names: string[] } // business entities — resolves via the `website` identifier type
}At least one sub-key with non-empty names is required. Only email and phone accept hash_type; the other types require plaintext values (address parsing, name fuzzy-matching, LinkedIn slug normalization, and domain normalization cannot run on hashed inputs).
Address columns (single vs split):
- One column — each cell must be a complete address line, e.g.
"123 Main St, San Francisco CA 94110, US". - Multiple columns (most CRM exports — Shopify, HubSpot, Salesforce, BigCommerce — split address across several columns) — list them under
address.namesin street-first order (address1, address2?, city?, region?, postcode, country?); the server joins the row's cell values with", "before libpostal parsing.
Column order affects match quality, not just parse success: a wrong-order list often parses but maps components to the wrong slots and depresses match rates silently. Verify ordering whenever match rates are below expectation.
If more than 50% of address values fail libpostal parsing, the response includes a warnings[] entry with code address_parse_low_yield.
Migration from legacy *_columns keys: The flat email_columns, phone_columns, address_columns, name_columns, and linkedin_columns parameters from earlier V2 betas are rejected with a per-key error naming the lookup_columns.<key> replacement. Update existing callers to the nested shape.
tiebreaker_hierarchy:
- When multiple identifiers resolve to different entities, the hierarchy determines which entity wins.
- Default:
["email", "phone", "address", "linkedin", "domain", "name"](email has highest priority).
domains (enrichment):
domains accepts both identifier-kind and trait-kind values. Identifier-kind values (name, email, phone, address, maid, website, social) are accepted regardless of entity_type. Allowed trait-kind values depend on entity_type:
person→affinity,content,demographic,employment,financial,household,intent,interest,lifestyle,political,purchasebusiness→about,appstore,digital,funding,hiring,industry,techstack
Note: domains accepts any string at the input boundary, so values outside the allowed set for the chosen entity_type are not rejected — they are simply not enriched. The column is created but populated with no data, which is easy to mistake for missing source data. Match the value to the right entity type to get a populated column.
When specified, domains adds enrichment columns (prefixed with _) to the output CSV.
Output columns added to CSV:
_entity_id- Resolved entity ID_match_score- Overall match confidence (0-1)_match_method- How the match was made (composite, single, tiebreaker)_matched_identifiers- Which identifiers matched_tiebreaker_winner- Which identifier type won the tiebreak (always present in the CSV header; only populated for rows resolved via the tiebreaker method)_{domain}- Domain enrichment data (when domains specified)
Request Schema:
interface ResolveAndEnrichRowsParams {
entity_type: "person" | "business";
csv_key?: string;
csv_resource_uri?: string;
lookup_columns?: {
email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
address?: { names: string[] };
name?: { names: string[] };
linkedin?: { names: string[] };
domain?: { names: string[] };
};
tiebreaker_hierarchy?: Array<"email" | "phone" | "address" | "name" | "linkedin" | "domain">;
min_score_threshold?: number;
domains: string[];
include_unmatched?: boolean;
include_score_breakdown?: boolean;
workflow_id: string;
offset?: number;
limit?: number;
}Output Format
{
export: {
download_url: string;
format: "csv";
expires_at: string;
},
stats: {
total_rows: number;
matched_rows: number;
unmatched_rows: number;
match_rate: number;
output_column_count: number;
by_method: {
composite: number;
single: number;
tiebreaker: number;
};
avg_score: number;
},
warnings?: Array<{ code: string; message: string }>,
tool_trace_id: string,
workflow_id: string
}Response Fields:
| Field | Type | Description |
|---|---|---|
| export.download_url | string | Presigned URL for enriched CSV |
| export.format | string | Always "csv" |
| export.expires_at | string | ISO 8601 expiration timestamp |
| stats.total_rows | number | Total input rows processed |
| stats.matched_rows | number | Rows with successful matches |
| stats.unmatched_rows | number | Rows with no matches |
| stats.match_rate | number | Match rate (0-1) |
| stats.output_column_count | number | Number of columns in output CSV |
| stats.by_method.composite | number | Rows matched via multiple identifiers |
| stats.by_method.single | number | Rows matched via single identifier |
| stats.by_method.tiebreaker | number | Rows resolved via tiebreaker |
| stats.avg_score | number | Average match score |
| warnings | array | Optional. Non-fatal warnings, e.g. address_parse_low_yield when most address values failed libpostal parsing — typically a sign that the columns under address.names were listed in the wrong order, or that a single mapped column contains only fragments without a postcode |
Example Response:
{
"export": {
"download_url": "https://s3.amazonaws.com/bucket/artifacts/550e.../resolved_rows_1705320000.csv?...",
"format": "csv",
"expires_at": "2025-01-16T13:00:00Z"
},
"stats": {
"total_rows": 5000,
"matched_rows": 4250,
"unmatched_rows": 750,
"match_rate": 0.85,
"output_column_count": 12,
"by_method": {
"composite": 2100,
"single": 1800,
"tiebreaker": 350
},
"avg_score": 0.78
},
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Common Errors
| Condition | Error message |
|---|---|
Both csv_key and csv_resource_uri provided | "csv_key and csv_resource_uri are mutually exclusive. Provide one or the other." |
Neither csv_key nor csv_resource_uri provided | "Either csv_key or csv_resource_uri must be provided." |
| No identifier columns specified | "At least one identifier column must be specified via lookup_columns (email, phone, address, name, linkedin, or domain)." |
csv_resource_uri does not point to uploads or artifacts sub-path | "csv_resource_uri must point to uploads or artifacts, got: <sub-path>" |
Deprecated email_columns key used | "'email_columns' has been removed. Use 'lookup_columns.email' instead." |
Deprecated phone_columns key used | "'phone_columns' has been removed. Use 'lookup_columns.phone' instead." |
Deprecated address_columns key used | "'address_columns' has been removed. Use 'lookup_columns.address' instead." |
Deprecated name_columns key used | "'name_columns' has been removed. Use 'lookup_columns.name' instead." |
Deprecated linkedin_columns key used | "'linkedin_columns' has been removed. Use 'lookup_columns.linkedin' instead." |
Performance Notes
- Resolves and enriches each page in a single backend pass (no chunking within a page); output is streamed to S3 to keep memory bounded
- Files larger than 40,000 rows are processed by calling repeatedly, passing the response's
next_offsetback asoffsetuntil it is omitted (last page) - Output maintains exact 1:1 row correspondence with the requested input page
Usage Examples
Example 1: Basic email resolution
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
"lookup_columns": { "email": { "names": ["email"] } },
"domains": ["demographic"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 2: Multi-identifier with enrichment
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/leads.csv",
"lookup_columns": {
"email": { "names": ["work_email", "personal_email"] },
"phone": { "names": ["phone"] },
"address": { "names": ["address"] }
},
"domains": ["demographic", "employment", "financial"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 3: High-quality matches only
{
"entity_type": "person",
"csv_key": "prospects.csv",
"lookup_columns": { "email": { "names": ["email"] } },
"domains": ["demographic", "employment"],
"min_score_threshold": 0.8,
"include_unmatched": false,
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 4: Pre-hashed identifiers
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/hashed_list.csv",
"lookup_columns": {
"email": { "names": ["email_md5"], "hash_type": "md5" },
"phone": { "names": ["phone_sha256"], "hash_type": "sha256" }
},
"domains": ["demographic"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 5: LinkedIn-based person resolution
When a CSV contains LinkedIn profile URLs or slugs, use lookup_columns.linkedin to resolve by LinkedIn without falling back to fuzzy name matching (which is unreliable on common names like "John Wu"):
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/prospects.csv",
"lookup_columns": { "linkedin": { "names": ["linkedin_url"] } },
"domains": ["demographic", "employment"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 6: Shopify customer export with split-address columns
Shopify customer exports split the address across six columns. List them under address.names in street-first order so the server joins the row's cell values into a canonical address string before parsing.
{
"entity_type": "person",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/shopify_customers.csv",
"lookup_columns": {
"email": { "names": ["Email"] },
"phone": { "names": ["Phone", "Default Address Phone"] },
"address": {
"names": [
"Default Address Address1",
"Default Address Address2",
"Default Address City",
"Default Address Province Code",
"Default Address Zip",
"Default Address Country Code"
]
},
"name": { "names": ["First Name", "Last Name"] }
},
"domains": ["demographic", "intent", "household"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 7: Domain-based business resolution
When a CSV is keyed on a company website, use lookup_columns.domain for business entities. Values can be bare domains (acme.com) or full URLs (https://www.Acme.COM/about); the server normalizes both to the bare lowercase host before matching the website identifier type. The trait-kind values in domains (industry, funding, techstack) are from the business set above.
{
"entity_type": "business",
"csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/companies.csv",
"lookup_columns": { "domain": { "names": ["website"] } },
"domains": ["industry", "funding", "techstack"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Common CRM column mappings
Reference lookup_columns.address.names orderings for the major CRM CSV exports. All four split address across columns; the orderings below are street-first so libpostal sees a canonical address string. Column names are case-sensitive — copy them exactly as the export emits them.
Shopify (Customers export):
{
"email": { "names": ["Email"] },
"phone": { "names": ["Phone", "Default Address Phone"] },
"address": {
"names": [
"Default Address Address1",
"Default Address Address2",
"Default Address City",
"Default Address Province Code",
"Default Address Zip",
"Default Address Country Code"
]
},
"name": { "names": ["First Name", "Last Name"] }
}HubSpot (Contacts export):
{
"email": { "names": ["Email"] },
"phone": { "names": ["Phone Number", "Mobile Phone Number"] },
"address": {
"names": ["Street Address", "Street Address 2", "City", "State/Region", "Postal Code", "Country/Region"]
},
"name": { "names": ["First Name", "Last Name"] }
}Salesforce (Contact / Lead report):
{
"email": { "names": ["Email"] },
"phone": { "names": ["Phone", "MobilePhone"] },
"address": {
"names": ["MailingStreet", "MailingCity", "MailingState", "MailingPostalCode", "MailingCountry"]
},
"name": { "names": ["FirstName", "LastName"] }
}For Salesforce Leads, swap the Mailing* columns for Street, City, State, PostalCode, Country.
BigCommerce (Customer export):
{
"email": { "names": ["Email"] },
"phone": { "names": ["Phone"] },
"address": {
"names": [
"Address Line 1",
"Address Line 2",
"Suburb/City",
"State",
"Zip/Postcode",
"Country"
]
},
"name": { "names": ["First Name", "Last Name"] }
}For each platform, drop any columns your specific export doesn't include (e.g. omit Default Address Address2 if the source doesn't capture it). Empty cells are skipped during the per-row join, so leaving a sometimes-blank column in the list is fine.