You’re viewing the V1 docs. V2 is now recommended — read the V2 docs.
Watt Data

ICP Analysis Workflow

Download N8N Workflow Template

Overview

The Ideal Customer Profile (ICP) Analysis workflow demonstrates how to analyze your existing customer base to identify defining characteristics, then use those insights to find lookalike audiences.

Recommended Approach: Use the analyze_customers and generate_audience workflow tools for automated ICP analysis with AI-driven insights. These tools combine identity resolution, profile enrichment, cluster analysis, and audience discovery into streamlined operations.

Manual Approach: This guide documents the manual multi-step process for integrators who need fine-grained control over each step.

Use Case

Goal: Given a list of customer identifiers (emails, phone numbers, or addresses), identify the common characteristics of your best customers and find similar people in the broader population.

Business Value:

  • Understand what makes your customers similar
  • Identify market segments and personas
  • Build lookalike audiences for marketing campaigns
  • Optimize customer acquisition targeting

Workflow Overview

Workflow Steps

Step 1: Resolve Identifiers to Person IDs

Convert your customer identifiers (emails, phones, addresses) into standardized person IDs that can be used across the platform.

Tool: resolve_identities

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "resolve_identities",
    "arguments": {
      "multi_identifiers": [
        {
          "id_type": "email",
          "hash_type": "plaintext",
          "values": [
            "customer1@example.com",
            "customer2@example.com",
            "customer3@example.com"
          ]
        },
        {
          "id_type": "phone",
          "hash_type": "plaintext",
          "values": [
            "5551234567",
            "5559876543"
          ]
        }
      ],
      "format": "json"
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "identities": [],
    "stats": {
      "requested": 5,
      "resolved": 4,
      "rate": 0.8
    },
    "export": {
      "url": "https://s3.amazonaws.com/presigned-url...",
      "format": "json",
      "rows": 4,
      "size_bytes": 2048,
      "expires_at": "2025-11-20T13:00:00.000Z"
    },
    "tool_trace_id": "abc123...",
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  },
  "id": 1
}

Download the JSON export from export.url to get the full identity resolution data:

[
  {
    "person_id": 12345,
    "overall_quality_score": 0.95,
    "matches": [
      {
        "criterion_type": "email_plaintext",
        "criterion_value": "customer1@example.com",
        "quality_score": 0.95
      }
    ],
    "identifiers": {
      "email": ["customer1@example.com", "alt1@example.com"],
      "phone": ["5551234567"]
    }
  },
  {
    "person_id": 67890,
    "overall_quality_score": 0.88,
    "matches": [
      {
        "criterion_type": "email_plaintext",
        "criterion_value": "customer2@example.com",
        "quality_score": 0.88
      }
    ],
    "identifiers": {
      "email": ["customer2@example.com"]
    }
  }
]

Key Points:

  • Use multi_identifiers to search across multiple identifier types simultaneously
  • format: "json" returns a download URL for the full dataset (use format: "none" for inline results with small datasets)
  • Filter results by overall_quality_score (e.g., >= 0.5) to ensure match quality
  • The resolved person_id values are used in subsequent enrichment steps

Step 2: Enrich Person Profiles

Retrieve detailed demographic, behavioral, and interest data for the resolved person IDs.

Tool: get_person

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_person",
    "arguments": {
      "person_ids": [12345, 67890, 11111, 22222],
      "domains": [
        "address", "affinity", "content", "demographic", "email",
        "employment", "financial", "household", "id", "intent_category",
        "intent_topic", "interest", "lifestyle", "maid", "name",
        "phone", "political", "purchase"
      ],
      "format": "none"
    }
  },
  "id": 2
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "profiles": [
      {
        "person_id": "12345",
        "metadata": {
          "quality_score": 0.92,
          "last_modified": "2025-01-15T10:30:00Z"
        },
        "domains": {
          "gender": "Female",
          "generation": "Millennial",
          "interested_fitness": "Yes",
          "interested_healthy_living": "Yes",
          "fitness_affinity": "High",
          "household_income_range": "$75K-$100K",
          "email1": "customer1@example.com",
          "email2": "alt1@example.com",
          "phone1": "5551234567",
          "first_name": "Jane",
          "last_name": "Smith"
        }
      },
      {
        "person_id": "67890",
        "metadata": {
          "quality_score": 0.88
        },
        "domains": {
          "gender": "Female",
          "generation": "Gen X",
          "interested_healthy_living": "Yes",
          "interested_outdoors": "Yes",
          "email1": "customer2@example.com",
          "first_name": "Sarah"
        }
      }
    ],
    "tool_trace_id": "def456...",
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  },
  "id": 2
}

Key Points:

  • Request all available domains to get a comprehensive view
  • The domains field contains attribute key-value pairs (e.g., "gender": "Female")
  • Attributes are returned as flat key-value pairs, not nested by domain category
  • Multi-value fields like email and phone are numbered (email1, email2, email3, etc.)
  • To use attributes in find_persons, you need to map them to cluster IDs using list_clusters
  • The profiles array is always populated, even when using export format
  • Batch requests support up to 1000 person IDs per call

Step 3: Analyze Attribute Patterns and Intersections

Aggregate the enriched profiles to identify common characteristics and combinations of attributes that frequently co-occur. This reveals the true ICP by finding the intersection of traits.

Analysis Approach:

  1. Extract attribute sets - Pull out relevant attributes from each profile, excluding contact info
  2. Count frequencies - Tally how often each attribute appears across all profiles
  3. Find top attributes - Identify the most common individual characteristics
  4. Discover intersections - Find pairs of attributes that co-occur frequently
  5. Build multi-attribute clusters - Identify 3+ attributes that define cohesive segments
  6. Calculate lift scores - Measure how much more likely attributes co-occur than by chance

Key Metrics:

  • Frequency: Percentage of customers with each attribute
  • Lift Score: Ratio of actual co-occurrence to expected random co-occurrence (>1.0 indicates positive correlation)
  • Intersection Rate: Percentage of customers with multiple attributes

Analysis Output Structure:

  • Top Attributes: Individual characteristics sorted by frequency (baseline view)
  • Top Intersections: Pairs of attributes with high co-occurrence and lift scores
  • Multi-Attribute Clusters: Groups of 3+ attributes representing cohesive segments (true ICP)

Key Insights:

  • Top Attributes: Individual characteristics sorted by frequency
  • Top Intersections: Pairs of attributes that co-occur frequently
    • lift > 1.0 means they appear together more than random chance
    • Higher lift = stronger association between attributes
  • Multi-Attribute Clusters: 3+ attributes that define cohesive segments
    • These represent your true ICP - the combination of traits that define your best customers
    • Use these for high-precision targeting (AND logic)
  • Strategy:
    • Use multi-attribute clusters with AND for precision (smaller, high-quality audience)
    • Use top individual attributes with OR for reach (larger, broader audience)

Step 4: Map Attributes to Cluster IDs

To use attributes in find_persons, you need to look up their cluster IDs using list_clusters.

Tool: list_clusters

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "list_clusters",
    "arguments": {
      "cluster_names": [
        "gender",
        "interested_healthy_living",
        "fitness_affinity",
        "household_income_range",
        "interested_fitness"
      ]
    }
  },
  "id": 4
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "clusters": [
      {
        "cluster_id": "1000000145",
        "cluster_hash": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
        "name": "gender",
        "value": "Female",
        "size": 85000000,
        "domain": "demographic"
      },
      {
        "cluster_id": "1000001556",
        "cluster_hash": "b2c3d4e5f67890a1b2c3d4e5f67890a1",
        "name": "interested_healthy_living",
        "value": "Yes",
        "size": 45000000,
        "domain": "interest"
      },
      {
        "cluster_id": "1000000045",
        "cluster_hash": "c3d4e5f67890a1b2c3d4e5f67890a1b2",
        "name": "fitness_affinity",
        "value": "High",
        "size": 12000000,
        "domain": "affinity"
      },
      {
        "cluster_id": "1000001234",
        "cluster_hash": "d4e5f67890a1b2c3d4e5f67890a1b2c3",
        "name": "household_income_range",
        "value": "$75K-$100K",
        "size": 28000000,
        "domain": "household"
      },
      {
        "cluster_id": "1000001523",
        "cluster_hash": "e5f67890a1b2c3d4e5f67890a1b2c3d4",
        "name": "interested_fitness",
        "value": "Yes",
        "size": 35000000,
        "domain": "interest"
      }
    ],
    "total": 5,
    "returned": 5,
    "tool_trace_id": "xyz789...",
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  },
  "id": 4
}

Key Points:

  • Each cluster represents a specific attribute value (e.g., gender=Female)
  • cluster_hash is a 32-character hex string - use this in boolean expressions for find_persons
  • cluster_hash is stable across data rebuilds (recommended over cluster_id)
  • size shows the population size for each cluster (useful for estimating audience size)
  • You must match both name AND value to your attributes
  • Use cluster hashes in the next step to build your lookalike audience

Step 5: Find Lookalike Audience

Use the cluster hashes from Step 4 to find similar people. You can use two strategies:

  • Precision Targeting (AND): Find people matching all key attributes
  • Reach Targeting (OR): Find people matching any key attribute

Tool: find_persons

Precision Approach (Recommended): Use the multi-attribute cluster to find high-quality matches. This targets the core ICP.

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "find_persons",
    "arguments": {
      "expression": "a1b2c3d4e5f67890a1b2c3d4e5f67890 AND b2c3d4e5f67890a1b2c3d4e5f67890a1 AND c3d4e5f67890a1b2c3d4e5f67890a1b2",
      "identifier_type": "email"
    }
  },
  "id": 5
}

Reach Approach: Use individual top clusters to cast a wider net for discovery.

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "find_persons",
    "arguments": {
      "expression": "a1b2c3d4e5f67890a1b2c3d4e5f67890 OR b2c3d4e5f67890a1b2c3d4e5f67890a1 OR c3d4e5f67890a1b2c3d4e5f67890a1b2",
      "identifier_type": "email"
    }
  },
  "id": 5
}

Hybrid Approach: Combine required attributes (AND) with optional attributes (OR) for balanced targeting.

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "find_persons",
    "arguments": {
      "expression": "(a1b2c3d4e5f67890a1b2c3d4e5f67890 AND b2c3d4e5f67890a1b2c3d4e5f67890a1) OR (a1b2c3d4e5f67890a1b2c3d4e5f67890 AND c3d4e5f67890a1b2c3d4e5f67890a1b2)",
      "identifier_type": "email"
    }
  },
  "id": 5
}

Response (Precision Approach):

{
  "jsonrpc": "2.0",
  "result": {
    "total": 450000,
    "sample": [
      {
        "person_id": 99999,
        "identifiers": {
          "email": {
            "email1": "prospect1@example.com",
            "email2": "prospect1-alt@example.com"
          }
        }
      },
      {
        "person_id": 88888,
        "identifiers": {
          "email": {
            "email1": "prospect2@example.com"
          }
        }
      }
    ],
    "tool_trace_id": "ghi789...",
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  },
  "id": 5
}

Response (Reach Approach):

{
  "jsonrpc": "2.0",
  "result": {
    "total": 8500000,
    "sample": [...]
  }
}

Key Points:

  • expression uses boolean logic with cluster IDs (AND, OR, NOT operators)
  • Precision (AND): Smaller audience, higher match quality
    • Example: 450K people matching ALL 3 attributes
    • Best for: High-value campaigns, limited budgets, quality over quantity
  • Reach (OR): Larger audience, broader targeting
    • Example: 8.5M people matching ANY of 3 attributes
    • Best for: Brand awareness, discovery, testing new segments
  • Hybrid: Balanced approach using combinations
    • Example: Multiple AND groups connected by OR
    • Best for: Testing multiple ICP variants simultaneously
  • total shows the full audience size (use this to estimate campaign reach)
  • sample provides up to 10 preview records
  • Use identifier_type to specify which contact info to return (email, phone, or address)
  • For full export, add "format": "csv" or "format": "json"

Strategy Recommendations:

ApproachAudience SizeMatch QualityUse Case
AND (3+ clusters)100K - 1MVery HighCore ICP, high-value offers
AND (2 clusters)500K - 5MHighStandard campaigns
Hybrid1M - 10MMedium-HighA/B testing segments
OR5M+VariableDiscovery, cold outreach

Expression Examples:

// Precision: Core ICP with all defining traits
"1000000145 AND 1000001556 AND 1000000045"  // 450K people

// With exclusions: Remove known non-converters
"(1000000145 AND 1000001556) AND NOT 1000002000"  // 380K people

// Multiple precise segments: Test different ICP hypotheses
"(1000000145 AND 1000001556 AND 1000000045) OR (1000000145 AND 1000001234 AND 1000001523)"  // 720K people

// Reach: Any matching characteristic
"1000000145 OR 1000001556 OR 1000000045"  // 8.5M people

Step 6: Enrich Lookalike Profiles (Optional)

Retrieve full profiles for the lookalike audience to validate the match quality or for further analysis.

Tool: get_person

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_person",
    "arguments": {
      "person_ids": [99999, 88888, 77777],
      "domains": [
        "demographic", "interest", "affinity", "household"
      ],
      "format": "none"
    }
  },
  "id": 6
}

Response: (Same structure as Step 2)

Key Points:

  • You can request only specific domains instead of all domains
  • Compare lookalike profiles to original customer profiles to validate similarity
  • Use enriched data for personalized outreach campaigns

Performance Considerations

Batching

  • resolve_identities: No per-request limit, but use format: "json" for large datasets
  • get_person: Max 1000 person_ids per request; split larger datasets into chunks of 1000
  • find_persons: Returns up to 10 sample records inline; use export format for full results

Export Formats

  • "format": "none" - Inline response (best for < 100 records)
  • "format": "json" - S3 export as JSON (best for analysis)
  • "format": "csv" - S3 export as CSV (best for imports to other tools)
  • "format": "jsonl" - S3 export as JSON Lines (best for streaming/large datasets)

Workflow ID Tracking

Pass the workflow_id from the first response through all subsequent requests to track the entire workflow:

{
  "name": "get_person",
  "arguments": {
    "person_ids": [...],
    "domains": [...],
    "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

This enables:

  • End-to-end request tracing
  • Performance analysis across tools
  • Feedback submission for data quality issues

Common Variations

Geographic Targeting

Add location filters to find_persons by including a location object with latitude, longitude, radius, and unit (miles or km). This filters the lookalike audience to specific geographic areas.

Exclude Existing Customers

Use NOT operators to exclude clusters that identify existing customers. First, identify clusters that represent your customer base, then exclude them using boolean NOT logic combined with AND.

Multi-Segment ICP

Analyze multiple customer segments separately, then combine their defining clusters:

  1. Run ICP analysis for each segment (e.g., high-value customers, frequent purchasers)
  2. Identify top clusters for each segment
  3. Build separate expressions for each segment
  4. Combine with OR logic: (segment1_clusters) OR (segment2_clusters)

Error Handling

Identity Resolution Failures

If stats.rate < 0.5, consider:

  • Identifier quality (are emails valid?)
  • Hash type mismatch (plaintext vs. hashed)
  • Formatting (phones should be digits only, no country code)

Empty Lookalike Audience

If total: 0, consider:

  • Expression too restrictive (try OR instead of AND)
  • Rare cluster combinations
  • Use get_cluster tool to understand cluster size before building expressions

Export URL Expiration

Export URLs expire after 1 hour. If expired:

  • Re-run the original request
  • Download and cache results immediately
  • Use workflow_id to track retries

Next Steps

After building your lookalike audience:

  1. Activate in marketing platforms - Export as CSV and upload to ad platforms
  2. Validate campaign performance - Track conversion rates and refine clusters
  3. Iterate on cluster selection - Test different cluster combinations
  4. Feedback loop - Use submit_feedback tool to report data quality issues

Related Workflows:

On this page