Watt Data Logo

Resolve each row of a CSV file to a platform entity ID and optionally append enrichment data, preserving the original row structure.

Quick Example

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "email_columns": { "names": ["email"] },
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity to resolve
csv_keystringConditional-FilenameCSV filename from generate_upload_url. Mutually exclusive with csv_resource_uri
csv_resource_uristringConditional-workflow:// URICSV resource URI. Mutually exclusive with csv_key
email_columnsobjectNo-{ names: string[], hash_type?: string }Columns containing email addresses
phone_columnsobjectNo-{ names: string[], hash_type?: string }Columns containing phone numbers
address_columnsobjectNo-{ names: string[], hash_type?: string }Columns containing physical addresses
name_columnsobjectNo-{ names: string[], hash_type?: string }Columns containing entity names
tiebreaker_hierarchyarrayNo["email", "phone", "address", "name"]Ordered arrayPriority for divergent identifier resolution
min_score_thresholdnumberNo0.00.0-1.0Minimum match score threshold
domainsarrayNo-Enrichment domainsOptional enrichment domains to include
contact_typesarrayNo-"name", "email", "phone", "address"Contact types from resolved profiles
include_unmatchedbooleanNotrue-Include rows with no matches in output
include_score_breakdownbooleanNofalse-Include detailed score breakdown per identifier
workflow_idstringYes-Valid UUIDWorkflow session ID from generate_upload_url

Parameter Details:

csv_key vs csv_resource_uri:

  • Provide exactly one. They are mutually exclusive.
  • At least one identifier column must be specified (email_columns, phone_columns, address_columns, or name_columns)

email_columns / phone_columns / address_columns / name_columns:

  • names - Array of CSV column names containing the identifier values
  • hash_type - Hash format of values in those columns: "plaintext" (default), "md5", "sha1", or "sha256"
  • Use hash_type when your CSV contains pre-hashed identifiers

tiebreaker_hierarchy:

  • When multiple identifiers resolve to different entities, the hierarchy determines which entity wins
  • Default: ["email", "phone", "address", "name"] (email has highest priority)

domains (enrichment):

  • Available: demographic, affinity, content, employment, financial, household, id, intent, interest, lifestyle, political, purchase
  • When specified, adds enrichment columns (prefixed with _) to the output CSV

Output columns added to CSV:

  • _entity_id - Resolved entity ID
  • _match_score - Overall match confidence (0-1)
  • _match_method - How the match was made (composite, single, tiebreaker)
  • _matched_identifiers - Which identifiers matched
  • _tiebreaker_winner - Which identifier type won the tiebreak
  • _enriched_{contact_type} - Enriched contact data (when contact_types specified)
  • _{domain} - Domain enrichment data (when domains specified)

Request Schema:

interface ResolveAndEnrichRowsParams {
  entity_type: "person" | "business";
  csv_key?: string;
  csv_resource_uri?: string;
  email_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  phone_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  address_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  name_columns?: { names: string[]; hash_type?: "plaintext" | "md5" | "sha1" | "sha256" };
  tiebreaker_hierarchy?: Array<"email" | "phone" | "address" | "name">;
  min_score_threshold?: number;
  domains?: Array<"demographic" | "affinity" | "content" | "employment" | "financial" | "household" | "id" | "intent" | "interest" | "lifestyle" | "political" | "purchase">;
  contact_types?: Array<"name" | "email" | "phone" | "address">;
  include_unmatched?: boolean;
  include_score_breakdown?: boolean;
  workflow_id: string;
}

Output Format

{
  export: {
    download_url: string;
    format: "csv";
    expires_at: string;
  },
  stats: {
    total_rows: number;
    matched_rows: number;
    unmatched_rows: number;
    match_rate: number;
    by_method: {
      composite: number;
      single: number;
      tiebreaker: number;
    };
    avg_score: number;
  },
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
export.download_urlstringPresigned URL for enriched CSV
export.formatstringAlways "csv"
export.expires_atstringISO 8601 expiration timestamp
stats.total_rowsnumberTotal input rows processed
stats.matched_rowsnumberRows with successful matches
stats.unmatched_rowsnumberRows with no matches
stats.match_ratenumberMatch rate (0-1)
stats.by_method.compositenumberRows matched via multiple identifiers
stats.by_method.singlenumberRows matched via single identifier
stats.by_method.tiebreakernumberRows resolved via tiebreaker
stats.avg_scorenumberAverage match score

Example Response:

{
  "export": {
    "download_url": "https://s3.amazonaws.com/bucket/artifacts/550e.../resolved_rows_1705320000.csv?...",
    "format": "csv",
    "expires_at": "2025-01-16T13:00:00Z"
  },
  "stats": {
    "total_rows": 5000,
    "matched_rows": 4250,
    "unmatched_rows": 750,
    "match_rate": 0.85,
    "by_method": {
      "composite": 2100,
      "single": 1800,
      "tiebreaker": 350
    },
    "avg_score": 0.78
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Performance Notes

  • Processes in streaming batches of 10,000 rows for memory efficiency
  • Suitable for files with millions of rows
  • Output maintains exact 1:1 row correspondence with input

Usage Examples

Example 1: Basic email resolution

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/customers.csv",
  "email_columns": { "names": ["email"] },
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: Multi-identifier with enrichment

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/leads.csv",
  "email_columns": { "names": ["work_email", "personal_email"] },
  "phone_columns": { "names": ["phone"] },
  "address_columns": { "names": ["address"] },
  "domains": ["demographic", "employment", "financial"],
  "contact_types": ["phone", "email"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: High-quality matches only

{
  "entity_type": "person",
  "csv_key": "prospects.csv",
  "email_columns": { "names": ["email"] },
  "min_score_threshold": 0.8,
  "include_unmatched": false,
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 4: Pre-hashed identifiers

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/uploads/hashed_list.csv",
  "email_columns": { "names": ["email_md5"], "hash_type": "md5" },
  "phone_columns": { "names": ["phone_sha256"], "hash_type": "sha256" },
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

On this page