group_entities_by_trait — Watt Data Docs

Enrich a set of entity profiles and compute trait frequency distributions across the audience for ICP analysis.

Quick Example

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "domains": ["demographic", "affinity"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
entity_type	string	Yes	-	"person" or "business"	Type of entity to enrich
entity_ids	array	Conditional	-	Array of strings or integers	Entity IDs (inline mode). Mutually exclusive with entity_ids_uri
entity_ids_uri	string	Conditional	-	workflow:// URI	CSV or Parquet with entity IDs. Mutually exclusive with entity_ids
entity_id_column	string	No	"entity_id"	Column name	Column containing entity IDs (only with entity_ids_uri)
domains	array	Yes	-	Min 1 trait domain valid for `entity_type`	Trait domains to aggregate (see "Trait domains" below)
trait_limit	number	No	-	Positive integer	Maximum traits to return in trait_frequencies
workflow_id	string	No	-	Valid UUID	Workflow ID for tracking and persistence

Parameter Details:

entity_ids vs entity_ids_uri:

Provide exactly one. They are mutually exclusive.
entity_ids for small datasets (inline array)
entity_ids_uri for chaining from entity_resolve or entity_find output (recommended)
entity_ids_uri supports both .csv and .parquet files

Trait domains:

The values allowed in domains depend on entity_type and are validated server-side:

person → affinity, content, demographic, employment, financial, geo, household, intent, interest, lifestyle, political, purchase
business → about, appstore, digital, funding, hiring, industry, techstack

At least one domain is required, and a value outside the allowed set for the chosen entity_type is rejected. geo is person-only — boundary bitmaps don't exist for businesses.

Request Schema:

interface GroupEntitiesByTraitParams {
  entity_type: "person" | "business";
  entity_ids?: Array<string | number>;
  entity_ids_uri?: string;
  entity_id_column?: string;
  // Allowed values depend on entity_type — values outside the per-entity set
  // are rejected with a validation error listing the legal values.
  // person: Array<"affinity" | "content" | "demographic" | "employment" | "financial" | "geo" | "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase">
  // business: Array<"about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack">
  domains: Array<
    | "affinity" | "content" | "demographic" | "employment" | "financial" | "geo"
    | "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase"
    | "about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack"
  >;
  trait_limit?: number;
  workflow_id?: string;
}

Output Format

{
  enrichment: {
    total_entities: number;
    enriched_entities: number;
    profiles_with_traits: number;
    enrichment_rate: number;
    by_domain: Record<string, number>;
  },
  trait_frequencies: Array<{
    trait_hash: string;
    trait_name: string;
    trait_value: string;
    domain: string;
    audience_count: number;
    audience_prevalence: number;
  }>,
  resourceLinks: Array<{
    uri: string;
    name: string;
    mimeType: string;
  }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

Field	Type	Description
enrichment.total_entities	number	Total input entities
enrichment.enriched_entities	number	Profiles returned by entity_enrich (resolution count)
enrichment.profiles_with_traits	number	Profiles that produced ≥1 normalized field across the requested domains. Use this to detect a healthy resolution rate paired with zero trait yield: `enrichment_rate` near 1.0 with `profiles_with_traits == 0` means resolution succeeded but no trait data was found
enrichment.enrichment_rate	number	Enrichment success rate (0-1)
enrichment.by_domain	object	Enriched count per domain
trait_frequencies	array	Trait frequency distribution for the audience
trait_frequencies[].trait_hash	string	Stable trait hash
trait_frequencies[].trait_name	string	Trait name
trait_frequencies[].trait_value	string	Trait value
trait_frequencies[].domain	string	Domain category
trait_frequencies[].audience_count	number	Entities with this trait
trait_frequencies[].audience_prevalence	number	Audience proportion (0-1)
resourceLinks	array	MCP resource links to persisted artifacts. Populated when a workflow_id is in scope for the call (provided by the caller or established by the workflow session); empty if validation fails
resourceLinks[].uri	string	Workflow resource URI (e.g. `workflow://<workflow_id>/artifacts/trait_frequencies.parquet`)
resourceLinks[].name	string	Artifact filename (`trait_frequencies.parquet`)
resourceLinks[].mimeType	string	MIME type (`application/parquet`)

Example Response:

{
  "enrichment": {
    "total_entities": 500,
    "enriched_entities": 425,
    "profiles_with_traits": 410,
    "enrichment_rate": 0.85,
    "by_domain": {
      "demographic": 400,
      "affinity": 380,
      "intent": 350
    }
  },
  "trait_frequencies": [
    {
      "trait_hash": "a1b2c3d4e5f67890",
      "trait_name": "tech_affinity",
      "trait_value": "high",
      "domain": "affinity",
      "audience_count": 225,
      "audience_prevalence": 0.45
    },
    {
      "trait_hash": "b2c3d4e5f6789012",
      "trait_name": "income_level",
      "trait_value": "high",
      "domain": "demographic",
      "audience_count": 190,
      "audience_prevalence": 0.38
    }
  ],
  "resourceLinks": [
    {
      "uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
      "name": "trait_frequencies.parquet",
      "mimeType": "application/parquet"
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common Errors

Condition	Error message
Neither `entity_ids` nor `entity_ids_uri` provided (or `entity_ids` is empty)	`"Provide exactly one input: entity_ids or entity_ids_uri"`
`entity_ids_uri` does not end in `.csv` or `.parquet`	`"entity_ids_uri must point to a .csv or .parquet file, got: <uri>"`
`domains` contains a value not allowed for the given `entity_type`	`"Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>."`

Chaining to calculate_trait_lift

When a workflow_id is provided, group_entities_by_trait persists a trait_frequencies.parquet artifact. Pass the resource URI to calculate_trait_lift:

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}

Clients can pick up the artifact two ways — both reference the same file:

The workflow://…/trait_frequencies.parquet URI shown above (constructable directly from workflow_id).
The resourceLinks[0].uri returned in the tool response, fetched via the MCP resource protocol.

Usage Examples

Example 1: Inline entity IDs

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "domains": ["demographic", "affinity", "intent"]
}

Example 2: From entity_resolve output

{
  "entity_type": "person",
  "entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "entity_id_column": "entity_id",
  "domains": ["demographic", "affinity", "interest"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: Limited trait output

{
  "entity_type": "person",
  "entity_ids": ["123", "456"],
  "domains": ["demographic"],
  "trait_limit": 20
}

Passing "geo" in domains aggregates the audience along boundary types (state, zip5, county, dma, cbsa, msa, congressional_district). Each entity contributes one row per boundary it belongs to, so the resulting trait_frequencies answers "where does this audience concentrate."

geo composes with other domains in the same call — the tool runs the profile-enrichment pass and the boundary-bitmap pass in parallel and merges the results. A geo-only call skips profile enrichment entirely.

Example: group an audience by state and DMA

{
  "entity_type": "person",
  "entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "domains": ["geo"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

The returned trait_frequencies rows use trait_name = <boundary_type> and trait_value = <boundary_value>:

{
  "trait_frequencies": [
    {
      "trait_hash": "<geo.state=CA hash>",
      "trait_name": "state",
      "trait_value": "CA",
      "domain": "geo",
      "audience_count": 1820,
      "audience_prevalence": 0.36
    },
    {
      "trait_hash": "<geo.dma=803 hash>",
      "trait_name": "dma",
      "trait_value": "803",
      "domain": "geo",
      "audience_count": 510,
      "audience_prevalence": 0.10
    }
  ]
}

Chain the parquet artifact into calculate_trait_lift to surface which states or DMAs over-index for the audience versus the national baseline.

geo is person-only. A business call with "geo" in domains is rejected at validation.