Analyze your existing customer base to identify defining characteristics, then find lookalike audiences.
Step 1: Resolve Identifiers
Convert customer identifiers to entity IDs. See entity_resolve for full parameter details.
{
"entity_type": "person",
"identifiers": [
{ "id_type": "email", "hash_type": "plaintext", "values": ["customer1@example.com"] },
{ "id_type": "phone", "hash_type": "plaintext", "values": ["5551234567"] }
],
"format": "json"
}Pass multiple identifier types for better match rates. Filter results by overall_quality_score >= 0.5.
Step 2: Profile Audience Traits
Enrich resolved entities and aggregate trait frequencies to characterize your audience.
{
"entity_type": "person",
"entity_ids": ["12345", "67890", "11111"],
"domains": ["demographic", "interest", "affinity", "lifestyle", "household", "financial"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}The tool persists a trait_frequencies.parquet artifact used in the next step. Use entity_ids_uri for large datasets.
Step 3: Calculate Trait Lift
Compare your audience's trait frequencies to the world baseline to find distinguishing traits.
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400.../artifacts/trait_frequencies.parquet",
"top_n": 15,
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Results include lift (audience prevalence / world prevalence). Higher lift = more defining. Use the top trait hashes in Step 4.
Step 4: Find Lookalike Audience
Build a boolean expression from the high-lift trait hashes and query with entity_find.
{
"entity_type": "person",
"expression": "c3d4e5f67890a1b2 AND a1b2c3d4e5f67890 AND d4e5f67890a1b2c3",
"identifier_types": ["email"],
"format": "csv",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Choose a targeting strategy based on your campaign goals:
| Approach | Audience Size | Quality | Use Case |
|---|---|---|---|
| AND (3+ traits) | 100K–1M | Very High | Core ICP, high-value offers |
| AND (2 traits) | 500K–5M | High | Standard campaigns |
| OR | 5M+ | Variable | Discovery, cold outreach |
Step 5: Validate Lookalike Profiles (Optional)
Enrich a sample of the lookalike audience with entity_enrich and compare to your original customer profiles.
{
"entity_type": "person",
"entity_ids": ["99999", "88888", "77777"],
"domains": ["demographic", "interest", "affinity"]
}Slicing by geographic dimension
The same Step 2 → Step 3 pipeline accepts "geo" in domains, producing per-audience boundary memberships (state, dma, county, cbsa, msa, zip5, congressional_district) that flow through calculate_trait_lift as ordinary rows.
{
"entity_type": "person",
"entity_ids_uri": "workflow://550e8400.../artifacts/resolved_identities.parquet",
"domains": ["geo"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Chaining the resulting trait_frequencies.parquet into calculate_trait_lift surfaces which states or DMAs over-index for the audience versus the national baseline — the geographic equivalent of Step 3's "defining characteristics" output. Combine geo with the usual domains in one call ("domains": ["geo", "demographic", "interest"]) to get both shape and place in a single pass. Geo is person-only.
Related guides: