Enrich a set of entity profiles and compute trait frequency distributions across the audience for ICP analysis.
Quick Example
{
"entity_type": "person",
"entity_ids": ["123", "456", "789"],
"domains": ["demographic", "affinity"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| entity_type | string | Yes | - | "person" or "business" | Type of entity to enrich |
| entity_ids | array | Conditional | - | Array of strings or integers | Entity IDs (inline mode). Mutually exclusive with entity_ids_uri |
| entity_ids_uri | string | Conditional | - | workflow:// URI | CSV or Parquet with entity IDs. Mutually exclusive with entity_ids |
| entity_id_column | string | No | "entity_id" | Column name | Column containing entity IDs (only with entity_ids_uri) |
| domains | array | Yes | - | Min 1 trait domain valid for entity_type | Trait domains to aggregate (see "Trait domains" below) |
| trait_limit | number | No | - | Positive integer | Maximum traits to return in trait_frequencies |
| workflow_id | string | No | - | Valid UUID | Workflow ID for tracking and persistence |
Parameter Details:
entity_ids vs entity_ids_uri:
- Provide exactly one. They are mutually exclusive.
entity_idsfor small datasets (inline array)entity_ids_urifor chaining fromentity_resolveorentity_findoutput (recommended)- entity_ids_uri supports both .csv and .parquet files
Trait domains:
The values allowed in domains depend on entity_type and are validated server-side:
person→affinity,content,demographic,employment,financial,geo,household,intent,interest,lifestyle,political,purchasebusiness→about,appstore,digital,funding,hiring,industry,techstack
At least one domain is required, and a value outside the allowed set for the chosen entity_type is rejected. geo is person-only — boundary bitmaps don't exist for businesses.
Request Schema:
interface GroupEntitiesByTraitParams {
entity_type: "person" | "business";
entity_ids?: Array<string | number>;
entity_ids_uri?: string;
entity_id_column?: string;
// Allowed values depend on entity_type — values outside the per-entity set
// are rejected with a validation error listing the legal values.
// person: Array<"affinity" | "content" | "demographic" | "employment" | "financial" | "geo" | "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase">
// business: Array<"about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack">
domains: Array<
| "affinity" | "content" | "demographic" | "employment" | "financial" | "geo"
| "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase"
| "about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack"
>;
trait_limit?: number;
workflow_id?: string;
}Output Format
{
enrichment: {
total_entities: number;
enriched_entities: number;
profiles_with_traits: number;
enrichment_rate: number;
by_domain: Record<string, number>;
},
trait_frequencies: Array<{
trait_hash: string;
trait_name: string;
trait_value: string;
domain: string;
audience_count: number;
audience_prevalence: number;
}>,
resourceLinks: Array<{
uri: string;
name: string;
mimeType: string;
}>,
tool_trace_id: string,
workflow_id: string
}Response Fields:
| Field | Type | Description |
|---|---|---|
| enrichment.total_entities | number | Total input entities |
| enrichment.enriched_entities | number | Profiles returned by entity_enrich (resolution count) |
| enrichment.profiles_with_traits | number | Profiles that produced ≥1 normalized field across the requested domains. Use this to detect a healthy resolution rate paired with zero trait yield: enrichment_rate near 1.0 with profiles_with_traits == 0 means resolution succeeded but no trait data was found |
| enrichment.enrichment_rate | number | Enrichment success rate (0-1) |
| enrichment.by_domain | object | Enriched count per domain |
| trait_frequencies | array | Trait frequency distribution for the audience |
| trait_frequencies[].trait_hash | string | Stable trait hash |
| trait_frequencies[].trait_name | string | Trait name |
| trait_frequencies[].trait_value | string | Trait value |
| trait_frequencies[].domain | string | Domain category |
| trait_frequencies[].audience_count | number | Entities with this trait |
| trait_frequencies[].audience_prevalence | number | Audience proportion (0-1) |
| resourceLinks | array | MCP resource links to persisted artifacts. Populated when a workflow_id is in scope for the call (provided by the caller or established by the workflow session); empty if validation fails |
| resourceLinks[].uri | string | Workflow resource URI (e.g. workflow://<workflow_id>/artifacts/trait_frequencies.parquet) |
| resourceLinks[].name | string | Artifact filename (trait_frequencies.parquet) |
| resourceLinks[].mimeType | string | MIME type (application/parquet) |
Example Response:
{
"enrichment": {
"total_entities": 500,
"enriched_entities": 425,
"profiles_with_traits": 410,
"enrichment_rate": 0.85,
"by_domain": {
"demographic": 400,
"affinity": 380,
"intent": 350
}
},
"trait_frequencies": [
{
"trait_hash": "a1b2c3d4e5f67890",
"trait_name": "tech_affinity",
"trait_value": "high",
"domain": "affinity",
"audience_count": 225,
"audience_prevalence": 0.45
},
{
"trait_hash": "b2c3d4e5f6789012",
"trait_name": "income_level",
"trait_value": "high",
"domain": "demographic",
"audience_count": 190,
"audience_prevalence": 0.38
}
],
"resourceLinks": [
{
"uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
"name": "trait_frequencies.parquet",
"mimeType": "application/parquet"
}
],
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Common Errors
| Condition | Error message |
|---|---|
Neither entity_ids nor entity_ids_uri provided (or entity_ids is empty) | "Provide exactly one input: entity_ids or entity_ids_uri" |
entity_ids_uri does not end in .csv or .parquet | "entity_ids_uri must point to a .csv or .parquet file, got: <uri>" |
domains contains a value not allowed for the given entity_type | "Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>." |
Chaining to calculate_trait_lift
When a workflow_id is provided, group_entities_by_trait persists a trait_frequencies.parquet artifact. Pass the resource URI to calculate_trait_lift:
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}Clients can pick up the artifact two ways — both reference the same file:
- The
workflow://…/trait_frequencies.parquetURI shown above (constructable directly fromworkflow_id). - The
resourceLinks[0].urireturned in the tool response, fetched via the MCP resource protocol.
Usage Examples
Example 1: Inline entity IDs
{
"entity_type": "person",
"entity_ids": ["123", "456", "789"],
"domains": ["demographic", "affinity", "intent"]
}Example 2: From entity_resolve output
{
"entity_type": "person",
"entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
"entity_id_column": "entity_id",
"domains": ["demographic", "affinity", "interest"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 3: Limited trait output
{
"entity_type": "person",
"entity_ids": ["123", "456"],
"domains": ["demographic"],
"trait_limit": 20
}Group by geographic dimension
Passing "geo" in domains aggregates the audience along boundary types (state, zip5, county, dma, cbsa, msa, congressional_district). Each entity contributes one row per boundary it belongs to, so the resulting trait_frequencies answers "where does this audience concentrate."
geo composes with other domains in the same call — the tool runs the profile-enrichment pass and the boundary-bitmap pass in parallel and merges the results. A geo-only call skips profile enrichment entirely.
Example: group an audience by state and DMA
{
"entity_type": "person",
"entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
"domains": ["geo"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}The returned trait_frequencies rows use trait_name = <boundary_type> and trait_value = <boundary_value>:
{
"trait_frequencies": [
{
"trait_hash": "<geo.state=CA hash>",
"trait_name": "state",
"trait_value": "CA",
"domain": "geo",
"audience_count": 1820,
"audience_prevalence": 0.36
},
{
"trait_hash": "<geo.dma=803 hash>",
"trait_name": "dma",
"trait_value": "803",
"domain": "geo",
"audience_count": 510,
"audience_prevalence": 0.10
}
]
}Chain the parquet artifact into calculate_trait_lift to surface which states or DMAs over-index for the audience versus the national baseline.
geo is person-only. A business call with "geo" in domains is rejected at validation.