ICP Analysis Workflow — Watt Data Docs

Analyze your existing customer base to identify defining characteristics, then find lookalike audiences.

Step 1: Resolve Identifiers

Convert customer identifiers to entity IDs. See entity_resolve for full parameter details.

{
  "entity_type": "person",
  "identifiers": [
    { "id_type": "email", "hash_type": "plaintext", "values": ["customer1@example.com"] },
    { "id_type": "phone", "hash_type": "plaintext", "values": ["5551234567"] }
  ],
  "format": "json"
}

Pass multiple identifier types for better match rates. Filter results by overall_quality_score >= 0.5.

Step 2: Profile Audience Traits

Enrich resolved entities and aggregate trait frequencies to characterize your audience.

{
  "entity_type": "person",
  "entity_ids": ["12345", "67890", "11111"],
  "domains": ["demographic", "interest", "affinity", "lifestyle", "household", "financial"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

The tool persists a trait_frequencies.parquet artifact used in the next step. Use entity_ids_uri for large datasets.

Step 3: Calculate Trait Lift

Compare your audience's trait frequencies to the world baseline to find distinguishing traits.

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400.../artifacts/trait_frequencies.parquet",
  "top_n": 15,
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Results include lift (audience prevalence / world prevalence). Higher lift = more defining. Use the top trait hashes in Step 4.

Step 4: Find Lookalike Audience

Build a boolean expression from the high-lift trait hashes and query with entity_find.

{
  "entity_type": "person",
  "expression": "c3d4e5f67890a1b2 AND a1b2c3d4e5f67890 AND d4e5f67890a1b2c3",
  "identifier_types": ["email"],
  "format": "csv",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Choose a targeting strategy based on your campaign goals:

Approach	Audience Size	Quality	Use Case
AND (3+ traits)	100K–1M	Very High	Core ICP, high-value offers
AND (2 traits)	500K–5M	High	Standard campaigns
OR	5M+	Variable	Discovery, cold outreach

Step 5: Validate Lookalike Profiles (Optional)

Enrich a sample of the lookalike audience with entity_enrich and compare to your original customer profiles.

{
  "entity_type": "person",
  "entity_ids": ["99999", "88888", "77777"],
  "domains": ["demographic", "interest", "affinity"]
}

Slicing by geographic dimension

The same Step 2 → Step 3 pipeline accepts "geo" in domains, producing per-audience boundary memberships (state, dma, county, cbsa, msa, zip5, congressional_district) that flow through calculate_trait_lift as ordinary rows.

{
  "entity_type": "person",
  "entity_ids_uri": "workflow://550e8400.../artifacts/resolved_identities.parquet",
  "domains": ["geo"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Chaining the resulting trait_frequencies.parquet into calculate_trait_lift surfaces which states or DMAs over-index for the audience versus the national baseline — the geographic equivalent of Step 3's "defining characteristics" output. Combine geo with the usual domains in one call ("domains": ["geo", "demographic", "interest"]) to get both shape and place in a single pass. Geo is person-only.

Related guides: