resolve_and_enrich_rows — Watt Data Docs

Resolve each row of a CSV file to a platform entity ID and optionally append enrichment data, preserving the original row structure.

Quick Example

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
entity_type	string	Yes	-	"person" or "business"	Type of entity to resolve
csv_key	string	Conditional	-	Filename	CSV filename from generate_upload_url. Mutually exclusive with csv_resource_uri
csv_resource_uri	string	Conditional	-	workflow:// URI	CSV resource URI. Mutually exclusive with csv_key
lookup_columns	object	Conditional	-	See below	Column-mapping for CSV-based resolution. Groups each identifier type's column names (and hash_type for email/phone) under one object. At least one sub-key with non-empty `names` is required
tiebreaker_hierarchy	array	No	["email", "phone", "address", "linkedin", "domain", "name"]	Ordered array	Priority for divergent identifier resolution
min_score_threshold	number	No	0.0	0.0-1.0	Minimum match score threshold
domains	array	Yes	-	Enrichment domains	Enrichment domains to include in output
include_unmatched	boolean	No	true	-	Include rows with no matches in output
include_score_breakdown	boolean	No	false	-	Include detailed score breakdown per identifier
workflow_id	string	Yes	-	Valid UUID	Workflow session ID from generate_upload_url
offset	number	No	0	Integer ≥ 0	Number of CSV data rows to skip before processing. Use with `limit` to paginate large CSVs across multiple calls
limit	number	No	40000	1 ≤ limit ≤ 40000	Maximum number of CSV data rows to process in this call. To process larger files, call repeatedly with increasing `offset` values

Parameter Details:

csv_key vs csv_resource_uri:

Provide exactly one. They are mutually exclusive.
csv_resource_uri must point to the uploads or artifacts sub-path of a workflow URI (e.g. workflow://{workflow_id}/uploads/file.csv or workflow://{workflow_id}/artifacts/file.csv). Other sub-paths are rejected.
At least one identifier column must be specified via lookup_columns.
The CSV is processed in pages of up to 40,000 rows per call. When more rows remain, the response includes a next_offset field — pass it back as offset on the next call. The field is omitted on the last page.

lookup_columns:

Maps CSV columns to identifier types. The same shape is used by entity_resolve — see Conventions → CSV Column Mapping for the canonical reference, including per-identifier rules, multi-column address joining, and the address_parse_low_yield warning.

{
  email?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  phone?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  address?:  { names: string[] },
  name?:     { names: string[] },
  linkedin?: { names: string[] },   // resolves via the `social:linkedin` identifier type
  domain?:   { names: string[] }    // business entities — resolves via the `website` identifier type
}

At least one sub-key with non-empty names is required. Only email and phone accept hash_type; the other types require plaintext values (address parsing, name fuzzy-matching, LinkedIn slug normalization, and domain normalization cannot run on hashed inputs).

Address columns (single vs split):

One column — each cell must be a complete address line, e.g. "123 Main St, San Francisco CA 94110, US".
Multiple columns (most CRM exports — Shopify, HubSpot, Salesforce, BigCommerce — split address across several columns) — list them under address.names in street-first order (address1, address2?, city?, region?, postcode, country?); the server joins the row's cell values with ", " before libpostal parsing.

Column order affects match quality, not just parse success: a wrong-order list often parses but maps components to the wrong slots and depresses match rates silently. Verify ordering whenever match rates are below expectation.

If more than 50% of address values fail libpostal parsing, the response includes a warnings[] entry with code address_parse_low_yield.

Migration from legacy *_columns keys: The flat email_columns, phone_columns, address_columns, name_columns, and linkedin_columns parameters from earlier V2 betas are rejected with a per-key error naming the lookup_columns.<key> replacement. Update existing callers to the nested shape.

tiebreaker_hierarchy:

When multiple identifiers resolve to different entities, the hierarchy determines which entity wins.
Default: ["email", "phone", "address", "linkedin", "domain", "name"] (email has highest priority).

domains (enrichment):

domains accepts both identifier-kind and trait-kind values. Identifier-kind values (name, email, phone, address, maid, website, social) are accepted regardless of entity_type. Allowed trait-kind values depend on entity_type:

person → affinity, content, demographic, employment, financial, household, intent, interest, lifestyle, political, purchase
business → about, appstore, digital, funding, hiring, industry, techstack

Note: domains accepts any string at the input boundary, so values outside the allowed set for the chosen entity_type are not rejected — they are simply not enriched. The column is created but populated with no data, which is easy to mistake for missing source data. Match the value to the right entity type to get a populated column.

When specified, domains adds enrichment columns (prefixed with _) to the output CSV.

Output columns added to CSV:

_entity_id - Resolved entity ID
_match_score - Overall match confidence (0-1)
_match_method - How the match was made (composite, single, tiebreaker)
_matched_identifiers - Which identifiers matched
_tiebreaker_winner - Which identifier type won the tiebreak (always present in the CSV header; only populated for rows resolved via the tiebreaker method)
_{domain} - Domain enrichment data (when domains specified)

Request Schema:

interface ResolveAndEnrichRowsParams {
  entity_type: "person" | "business";
  csv_key?: string;
  csv_resource_uri?: string;
  lookup_columns?: {
    email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    address?: { names: string[] };
    name?: { names: string[] };
    linkedin?: { names: string[] };
    domain?: { names: string[] };
  };
  tiebreaker_hierarchy?: Array<"email" | "phone" | "address" | "name" | "linkedin" | "domain">;
  min_score_threshold?: number;
  domains: string[];
  include_unmatched?: boolean;
  include_score_breakdown?: boolean;
  workflow_id: string;
  offset?: number;
  limit?: number;
}

Output Format

{
  export: {
    download_url: string;
    format: "csv";
    expires_at: string;
  },
  stats: {
    total_rows: number;
    matched_rows: number;
    unmatched_rows: number;
    match_rate: number;
    output_column_count: number;
    by_method: {
      composite: number;
      single: number;
      tiebreaker: number;
    };
    avg_score: number;
  },
  warnings?: Array<{ code: string; message: string }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

Field	Type	Description
export.download_url	string	Presigned URL for enriched CSV
export.format	string	Always "csv"
export.expires_at	string	ISO 8601 expiration timestamp
stats.total_rows	number	Total input rows processed
stats.matched_rows	number	Rows with successful matches
stats.unmatched_rows	number	Rows with no matches
stats.match_rate	number	Match rate (0-1)
stats.output_column_count	number	Number of columns in output CSV
stats.by_method.composite	number	Rows matched via multiple identifiers
stats.by_method.single	number	Rows matched via single identifier
stats.by_method.tiebreaker	number	Rows resolved via tiebreaker
stats.avg_score	number	Average match score
warnings	array	Optional. Non-fatal warnings, e.g. `address_parse_low_yield` when most address values failed libpostal parsing — typically a sign that the columns under `address.names` were listed in the wrong order, or that a single mapped column contains only fragments without a postcode

Example Response:

{
  "export": {
    "download_url": "https://s3.amazonaws.com/bucket/artifacts/550e.../resolved_rows_1705320000.csv?...",
    "format": "csv",
    "expires_at": "2025-01-16T13:00:00Z"
  },
  "stats": {
    "total_rows": 5000,
    "matched_rows": 4250,
    "unmatched_rows": 750,
    "match_rate": 0.85,
    "output_column_count": 12,
    "by_method": {
      "composite": 2100,
      "single": 1800,
      "tiebreaker": 350
    },
    "avg_score": 0.78
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common Errors

Condition	Error message
Both `csv_key` and `csv_resource_uri` provided	`"csv_key and csv_resource_uri are mutually exclusive. Provide one or the other."`
Neither `csv_key` nor `csv_resource_uri` provided	`"Either csv_key or csv_resource_uri must be provided."`
No identifier columns specified	`"At least one identifier column must be specified via lookup_columns (email, phone, address, name, linkedin, or domain)."`
`csv_resource_uri` does not point to `uploads` or `artifacts` sub-path	`"csv_resource_uri must point to uploads or artifacts, got: <sub-path>"`
Deprecated `email_columns` key used	`"'email_columns' has been removed. Use 'lookup_columns.email' instead."`
Deprecated `phone_columns` key used	`"'phone_columns' has been removed. Use 'lookup_columns.phone' instead."`
Deprecated `address_columns` key used	`"'address_columns' has been removed. Use 'lookup_columns.address' instead."`
Deprecated `name_columns` key used	`"'name_columns' has been removed. Use 'lookup_columns.name' instead."`
Deprecated `linkedin_columns` key used	`"'linkedin_columns' has been removed. Use 'lookup_columns.linkedin' instead."`

Performance Notes

Resolves and enriches each page in a single backend pass (no chunking within a page); output is streamed to S3 to keep memory bounded
Files larger than 40,000 rows are processed by calling repeatedly, passing the response's next_offset back as offset until it is omitted (last page)
Output maintains exact 1:1 row correspondence with the requested input page

Usage Examples

Example 1: Basic email resolution

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: Multi-identifier with enrichment

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/leads.csv",
  "lookup_columns": {
    "email": { "names": ["work_email", "personal_email"] },
    "phone": { "names": ["phone"] },
    "address": { "names": ["address"] }
  },
  "domains": ["demographic", "employment", "financial"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: High-quality matches only

{
  "entity_type": "person",
  "csv_key": "prospects.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic", "employment"],
  "min_score_threshold": 0.8,
  "include_unmatched": false,
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 4: Pre-hashed identifiers

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/hashed_list.csv",
  "lookup_columns": {
    "email": { "names": ["email_md5"], "hash_type": "md5" },
    "phone": { "names": ["phone_sha256"], "hash_type": "sha256" }
  },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 5: LinkedIn-based person resolution

When a CSV contains LinkedIn profile URLs or slugs, use lookup_columns.linkedin to resolve by LinkedIn without falling back to fuzzy name matching (which is unreliable on common names like "John Wu"):

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/prospects.csv",
  "lookup_columns": { "linkedin": { "names": ["linkedin_url"] } },
  "domains": ["demographic", "employment"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 6: Shopify customer export with split-address columns

Shopify customer exports split the address across six columns. List them under address.names in street-first order so the server joins the row's cell values into a canonical address string before parsing.

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/shopify_customers.csv",
  "lookup_columns": {
    "email": { "names": ["Email"] },
    "phone": { "names": ["Phone", "Default Address Phone"] },
    "address": {
      "names": [
        "Default Address Address1",
        "Default Address Address2",
        "Default Address City",
        "Default Address Province Code",
        "Default Address Zip",
        "Default Address Country Code"
      ]
    },
    "name": { "names": ["First Name", "Last Name"] }
  },
  "domains": ["demographic", "intent", "household"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 7: Domain-based business resolution

When a CSV is keyed on a company website, use lookup_columns.domain for business entities. Values can be bare domains (acme.com) or full URLs (https://www.Acme.COM/about); the server normalizes both to the bare lowercase host before matching the website identifier type. The trait-kind values in domains (industry, funding, techstack) are from the business set above.

{
  "entity_type": "business",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/companies.csv",
  "lookup_columns": { "domain": { "names": ["website"] } },
  "domains": ["industry", "funding", "techstack"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common CRM column mappings

Reference lookup_columns.address.names orderings for the major CRM CSV exports. All four split address across columns; the orderings below are street-first so libpostal sees a canonical address string. Column names are case-sensitive — copy them exactly as the export emits them.

Shopify (Customers export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone", "Default Address Phone"] },
  "address": {
    "names": [
      "Default Address Address1",
      "Default Address Address2",
      "Default Address City",
      "Default Address Province Code",
      "Default Address Zip",
      "Default Address Country Code"
    ]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

HubSpot (Contacts export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone Number", "Mobile Phone Number"] },
  "address": {
    "names": ["Street Address", "Street Address 2", "City", "State/Region", "Postal Code", "Country/Region"]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

Salesforce (Contact / Lead report):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone", "MobilePhone"] },
  "address": {
    "names": ["MailingStreet", "MailingCity", "MailingState", "MailingPostalCode", "MailingCountry"]
  },
  "name": { "names": ["FirstName", "LastName"] }
}

For Salesforce Leads, swap the Mailing* columns for Street, City, State, PostalCode, Country.

BigCommerce (Customer export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone"] },
  "address": {
    "names": [
      "Address Line 1",
      "Address Line 2",
      "Suburb/City",
      "State",
      "Zip/Postcode",
      "Country"
    ]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

For each platform, drop any columns your specific export doesn't include (e.g. omit Default Address Address2 if the source doesn't capture it). Empty cells are skipped during the per-row join, so leaving a sometimes-blank column in the list is fine.