Watt Data

Resolve each row of a CSV file to a platform entity ID and optionally append enrichment data, preserving the original row structure.

Quick Example

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity to resolve
csv_keystringConditional-FilenameCSV filename from generate_upload_url. Mutually exclusive with csv_resource_uri
csv_resource_uristringConditional-workflow:// URICSV resource URI. Mutually exclusive with csv_key
lookup_columnsobjectConditional-See belowColumn-mapping for CSV-based resolution. Groups each identifier type's column names (and hash_type for email/phone) under one object. At least one sub-key with non-empty names is required
tiebreaker_hierarchyarrayNo["email", "phone", "address", "linkedin", "domain", "name"]Ordered arrayPriority for divergent identifier resolution
min_score_thresholdnumberNo0.00.0-1.0Minimum match score threshold
domainsarrayYes-Enrichment domainsEnrichment domains to include in output
include_unmatchedbooleanNotrue-Include rows with no matches in output
include_score_breakdownbooleanNofalse-Include detailed score breakdown per identifier
workflow_idstringYes-Valid UUIDWorkflow session ID from generate_upload_url
offsetnumberNo0Integer ≥ 0Number of CSV data rows to skip before processing. Use with limit to paginate large CSVs across multiple calls
limitnumberNo400001 ≤ limit ≤ 40000Maximum number of CSV data rows to process in this call. To process larger files, call repeatedly with increasing offset values

Parameter Details:

csv_key vs csv_resource_uri:

  • Provide exactly one. They are mutually exclusive.
  • csv_resource_uri must point to the uploads or artifacts sub-path of a workflow URI (e.g. workflow://{workflow_id}/uploads/file.csv or workflow://{workflow_id}/artifacts/file.csv). Other sub-paths are rejected.
  • At least one identifier column must be specified via lookup_columns.
  • The CSV is processed in pages of up to 40,000 rows per call. When more rows remain, the response includes a next_offset field — pass it back as offset on the next call. The field is omitted on the last page.

lookup_columns:

Maps CSV columns to identifier types. The same shape is used by entity_resolve — see Conventions → CSV Column Mapping for the canonical reference, including per-identifier rules, multi-column address joining, and the address_parse_low_yield warning.

{
  email?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  phone?:    { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" },
  address?:  { names: string[] },
  name?:     { names: string[] },
  linkedin?: { names: string[] },   // resolves via the `social:linkedin` identifier type
  domain?:   { names: string[] }    // business entities — resolves via the `website` identifier type
}

At least one sub-key with non-empty names is required. Only email and phone accept hash_type; the other types require plaintext values (address parsing, name fuzzy-matching, LinkedIn slug normalization, and domain normalization cannot run on hashed inputs).

Address columns (single vs split):

  • One column — each cell must be a complete address line, e.g. "123 Main St, San Francisco CA 94110, US".
  • Multiple columns (most CRM exports — Shopify, HubSpot, Salesforce, BigCommerce — split address across several columns) — list them under address.names in street-first order (address1, address2?, city?, region?, postcode, country?); the server joins the row's cell values with ", " before libpostal parsing.

Column order affects match quality, not just parse success: a wrong-order list often parses but maps components to the wrong slots and depresses match rates silently. Verify ordering whenever match rates are below expectation.

If more than 50% of address values fail libpostal parsing, the response includes a warnings[] entry with code address_parse_low_yield.

Migration from legacy *_columns keys: The flat email_columns, phone_columns, address_columns, name_columns, and linkedin_columns parameters from earlier V2 betas are rejected with a per-key error naming the lookup_columns.<key> replacement. Update existing callers to the nested shape.

tiebreaker_hierarchy:

  • When multiple identifiers resolve to different entities, the hierarchy determines which entity wins.
  • Default: ["email", "phone", "address", "linkedin", "domain", "name"] (email has highest priority).

domains (enrichment):

domains accepts both identifier-kind and trait-kind values. Identifier-kind values (name, email, phone, address, maid, website, social) are accepted regardless of entity_type. Allowed trait-kind values depend on entity_type:

  • personaffinity, content, demographic, employment, financial, household, intent, interest, lifestyle, political, purchase
  • businessabout, appstore, digital, funding, hiring, industry, techstack

Note: domains accepts any string at the input boundary, so values outside the allowed set for the chosen entity_type are not rejected — they are simply not enriched. The column is created but populated with no data, which is easy to mistake for missing source data. Match the value to the right entity type to get a populated column.

When specified, domains adds enrichment columns (prefixed with _) to the output CSV.

Output columns added to CSV:

  • _entity_id - Resolved entity ID
  • _match_score - Overall match confidence (0-1)
  • _match_method - How the match was made (composite, single, tiebreaker)
  • _matched_identifiers - Which identifiers matched
  • _tiebreaker_winner - Which identifier type won the tiebreak (always present in the CSV header; only populated for rows resolved via the tiebreaker method)
  • _{domain} - Domain enrichment data (when domains specified)

Request Schema:

interface ResolveAndEnrichRowsParams {
  entity_type: "person" | "business";
  csv_key?: string;
  csv_resource_uri?: string;
  lookup_columns?: {
    email?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    phone?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
    address?: { names: string[] };
    name?: { names: string[] };
    linkedin?: { names: string[] };
    domain?: { names: string[] };
  };
  tiebreaker_hierarchy?: Array<"email" | "phone" | "address" | "name" | "linkedin" | "domain">;
  min_score_threshold?: number;
  domains: string[];
  include_unmatched?: boolean;
  include_score_breakdown?: boolean;
  workflow_id: string;
  offset?: number;
  limit?: number;
}

Output Format

{
  export: {
    download_url: string;
    format: "csv";
    expires_at: string;
  },
  stats: {
    total_rows: number;
    matched_rows: number;
    unmatched_rows: number;
    match_rate: number;
    output_column_count: number;
    by_method: {
      composite: number;
      single: number;
      tiebreaker: number;
    };
    avg_score: number;
  },
  warnings?: Array<{ code: string; message: string }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
export.download_urlstringPresigned URL for enriched CSV
export.formatstringAlways "csv"
export.expires_atstringISO 8601 expiration timestamp
stats.total_rowsnumberTotal input rows processed
stats.matched_rowsnumberRows with successful matches
stats.unmatched_rowsnumberRows with no matches
stats.match_ratenumberMatch rate (0-1)
stats.output_column_countnumberNumber of columns in output CSV
stats.by_method.compositenumberRows matched via multiple identifiers
stats.by_method.singlenumberRows matched via single identifier
stats.by_method.tiebreakernumberRows resolved via tiebreaker
stats.avg_scorenumberAverage match score
warningsarrayOptional. Non-fatal warnings, e.g. address_parse_low_yield when most address values failed libpostal parsing — typically a sign that the columns under address.names were listed in the wrong order, or that a single mapped column contains only fragments without a postcode

Example Response:

{
  "export": {
    "download_url": "https://s3.amazonaws.com/bucket/artifacts/550e.../resolved_rows_1705320000.csv?...",
    "format": "csv",
    "expires_at": "2025-01-16T13:00:00Z"
  },
  "stats": {
    "total_rows": 5000,
    "matched_rows": 4250,
    "unmatched_rows": 750,
    "match_rate": 0.85,
    "output_column_count": 12,
    "by_method": {
      "composite": 2100,
      "single": 1800,
      "tiebreaker": 350
    },
    "avg_score": 0.78
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common Errors

ConditionError message
Both csv_key and csv_resource_uri provided"csv_key and csv_resource_uri are mutually exclusive. Provide one or the other."
Neither csv_key nor csv_resource_uri provided"Either csv_key or csv_resource_uri must be provided."
No identifier columns specified"At least one identifier column must be specified via lookup_columns (email, phone, address, name, linkedin, or domain)."
csv_resource_uri does not point to uploads or artifacts sub-path"csv_resource_uri must point to uploads or artifacts, got: <sub-path>"
Deprecated email_columns key used"'email_columns' has been removed. Use 'lookup_columns.email' instead."
Deprecated phone_columns key used"'phone_columns' has been removed. Use 'lookup_columns.phone' instead."
Deprecated address_columns key used"'address_columns' has been removed. Use 'lookup_columns.address' instead."
Deprecated name_columns key used"'name_columns' has been removed. Use 'lookup_columns.name' instead."
Deprecated linkedin_columns key used"'linkedin_columns' has been removed. Use 'lookup_columns.linkedin' instead."

Performance Notes

  • Resolves and enriches each page in a single backend pass (no chunking within a page); output is streamed to S3 to keep memory bounded
  • Files larger than 40,000 rows are processed by calling repeatedly, passing the response's next_offset back as offset until it is omitted (last page)
  • Output maintains exact 1:1 row correspondence with the requested input page

Usage Examples

Example 1: Basic email resolution

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: Multi-identifier with enrichment

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/leads.csv",
  "lookup_columns": {
    "email": { "names": ["work_email", "personal_email"] },
    "phone": { "names": ["phone"] },
    "address": { "names": ["address"] }
  },
  "domains": ["demographic", "employment", "financial"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: High-quality matches only

{
  "entity_type": "person",
  "csv_key": "prospects.csv",
  "lookup_columns": { "email": { "names": ["email"] } },
  "domains": ["demographic", "employment"],
  "min_score_threshold": 0.8,
  "include_unmatched": false,
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 4: Pre-hashed identifiers

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/hashed_list.csv",
  "lookup_columns": {
    "email": { "names": ["email_md5"], "hash_type": "md5" },
    "phone": { "names": ["phone_sha256"], "hash_type": "sha256" }
  },
  "domains": ["demographic"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 5: LinkedIn-based person resolution

When a CSV contains LinkedIn profile URLs or slugs, use lookup_columns.linkedin to resolve by LinkedIn without falling back to fuzzy name matching (which is unreliable on common names like "John Wu"):

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/prospects.csv",
  "lookup_columns": { "linkedin": { "names": ["linkedin_url"] } },
  "domains": ["demographic", "employment"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 6: Shopify customer export with split-address columns

Shopify customer exports split the address across six columns. List them under address.names in street-first order so the server joins the row's cell values into a canonical address string before parsing.

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/shopify_customers.csv",
  "lookup_columns": {
    "email": { "names": ["Email"] },
    "phone": { "names": ["Phone", "Default Address Phone"] },
    "address": {
      "names": [
        "Default Address Address1",
        "Default Address Address2",
        "Default Address City",
        "Default Address Province Code",
        "Default Address Zip",
        "Default Address Country Code"
      ]
    },
    "name": { "names": ["First Name", "Last Name"] }
  },
  "domains": ["demographic", "intent", "household"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 7: Domain-based business resolution

When a CSV is keyed on a company website, use lookup_columns.domain for business entities. Values can be bare domains (acme.com) or full URLs (https://www.Acme.COM/about); the server normalizes both to the bare lowercase host before matching the website identifier type. The trait-kind values in domains (industry, funding, techstack) are from the business set above.

{
  "entity_type": "business",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/companies.csv",
  "lookup_columns": { "domain": { "names": ["website"] } },
  "domains": ["industry", "funding", "techstack"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common CRM column mappings

Reference lookup_columns.address.names orderings for the major CRM CSV exports. All four split address across columns; the orderings below are street-first so libpostal sees a canonical address string. Column names are case-sensitive — copy them exactly as the export emits them.

Shopify (Customers export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone", "Default Address Phone"] },
  "address": {
    "names": [
      "Default Address Address1",
      "Default Address Address2",
      "Default Address City",
      "Default Address Province Code",
      "Default Address Zip",
      "Default Address Country Code"
    ]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

HubSpot (Contacts export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone Number", "Mobile Phone Number"] },
  "address": {
    "names": ["Street Address", "Street Address 2", "City", "State/Region", "Postal Code", "Country/Region"]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

Salesforce (Contact / Lead report):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone", "MobilePhone"] },
  "address": {
    "names": ["MailingStreet", "MailingCity", "MailingState", "MailingPostalCode", "MailingCountry"]
  },
  "name": { "names": ["FirstName", "LastName"] }
}

For Salesforce Leads, swap the Mailing* columns for Street, City, State, PostalCode, Country.

BigCommerce (Customer export):

{
  "email": { "names": ["Email"] },
  "phone": { "names": ["Phone"] },
  "address": {
    "names": [
      "Address Line 1",
      "Address Line 2",
      "Suburb/City",
      "State",
      "Zip/Postcode",
      "Country"
    ]
  },
  "name": { "names": ["First Name", "Last Name"] }
}

For each platform, drop any columns your specific export doesn't include (e.g. omit Default Address Address2 if the source doesn't capture it). Empty cells are skipped during the per-row join, so leaving a sometimes-blank column in the list is fine.

On this page