Watt Data

Enrich a set of entity profiles and compute trait frequency distributions across the audience for ICP analysis.

Quick Example

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "domains": ["demographic", "affinity"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity to enrich
entity_idsarrayConditional-Array of strings or integersEntity IDs (inline mode). Mutually exclusive with entity_ids_uri
entity_ids_uristringConditional-workflow:// URICSV or Parquet with entity IDs. Mutually exclusive with entity_ids
entity_id_columnstringNo"entity_id"Column nameColumn containing entity IDs (only with entity_ids_uri)
domainsarrayYes-Min 1 trait domain valid for entity_typeTrait domains to aggregate (see "Trait domains" below)
trait_limitnumberNo-Positive integerMaximum traits to return in trait_frequencies
workflow_idstringNo-Valid UUIDWorkflow ID for tracking and persistence

Parameter Details:

entity_ids vs entity_ids_uri:

  • Provide exactly one. They are mutually exclusive.
  • entity_ids for small datasets (inline array)
  • entity_ids_uri for chaining from entity_resolve or entity_find output (recommended)
  • entity_ids_uri supports both .csv and .parquet files

Trait domains:

The values allowed in domains depend on entity_type and are validated server-side:

  • personaffinity, content, demographic, employment, financial, geo, household, intent, interest, lifestyle, political, purchase
  • businessabout, appstore, digital, funding, hiring, industry, techstack

At least one domain is required, and a value outside the allowed set for the chosen entity_type is rejected. geo is person-only — boundary bitmaps don't exist for businesses.

Request Schema:

interface GroupEntitiesByTraitParams {
  entity_type: "person" | "business";
  entity_ids?: Array<string | number>;
  entity_ids_uri?: string;
  entity_id_column?: string;
  // Allowed values depend on entity_type — values outside the per-entity set
  // are rejected with a validation error listing the legal values.
  // person: Array<"affinity" | "content" | "demographic" | "employment" | "financial" | "geo" | "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase">
  // business: Array<"about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack">
  domains: Array<
    | "affinity" | "content" | "demographic" | "employment" | "financial" | "geo"
    | "household" | "intent" | "interest" | "lifestyle" | "political" | "purchase"
    | "about" | "appstore" | "digital" | "funding" | "hiring" | "industry" | "techstack"
  >;
  trait_limit?: number;
  workflow_id?: string;
}

Output Format

{
  enrichment: {
    total_entities: number;
    enriched_entities: number;
    profiles_with_traits: number;
    enrichment_rate: number;
    by_domain: Record<string, number>;
  },
  trait_frequencies: Array<{
    trait_hash: string;
    trait_name: string;
    trait_value: string;
    domain: string;
    audience_count: number;
    audience_prevalence: number;
  }>,
  resourceLinks: Array<{
    uri: string;
    name: string;
    mimeType: string;
  }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
enrichment.total_entitiesnumberTotal input entities
enrichment.enriched_entitiesnumberProfiles returned by entity_enrich (resolution count)
enrichment.profiles_with_traitsnumberProfiles that produced ≥1 normalized field across the requested domains. Use this to detect a healthy resolution rate paired with zero trait yield: enrichment_rate near 1.0 with profiles_with_traits == 0 means resolution succeeded but no trait data was found
enrichment.enrichment_ratenumberEnrichment success rate (0-1)
enrichment.by_domainobjectEnriched count per domain
trait_frequenciesarrayTrait frequency distribution for the audience
trait_frequencies[].trait_hashstringStable trait hash
trait_frequencies[].trait_namestringTrait name
trait_frequencies[].trait_valuestringTrait value
trait_frequencies[].domainstringDomain category
trait_frequencies[].audience_countnumberEntities with this trait
trait_frequencies[].audience_prevalencenumberAudience proportion (0-1)
resourceLinksarrayMCP resource links to persisted artifacts. Populated when a workflow_id is in scope for the call (provided by the caller or established by the workflow session); empty if validation fails
resourceLinks[].uristringWorkflow resource URI (e.g. workflow://<workflow_id>/artifacts/trait_frequencies.parquet)
resourceLinks[].namestringArtifact filename (trait_frequencies.parquet)
resourceLinks[].mimeTypestringMIME type (application/parquet)

Example Response:

{
  "enrichment": {
    "total_entities": 500,
    "enriched_entities": 425,
    "profiles_with_traits": 410,
    "enrichment_rate": 0.85,
    "by_domain": {
      "demographic": 400,
      "affinity": 380,
      "intent": 350
    }
  },
  "trait_frequencies": [
    {
      "trait_hash": "a1b2c3d4e5f67890",
      "trait_name": "tech_affinity",
      "trait_value": "high",
      "domain": "affinity",
      "audience_count": 225,
      "audience_prevalence": 0.45
    },
    {
      "trait_hash": "b2c3d4e5f6789012",
      "trait_name": "income_level",
      "trait_value": "high",
      "domain": "demographic",
      "audience_count": 190,
      "audience_prevalence": 0.38
    }
  ],
  "resourceLinks": [
    {
      "uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
      "name": "trait_frequencies.parquet",
      "mimeType": "application/parquet"
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Common Errors

ConditionError message
Neither entity_ids nor entity_ids_uri provided (or entity_ids is empty)"Provide exactly one input: entity_ids or entity_ids_uri"
entity_ids_uri does not end in .csv or .parquet"entity_ids_uri must point to a .csv or .parquet file, got: <uri>"
domains contains a value not allowed for the given entity_type"Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>."

Chaining to calculate_trait_lift

When a workflow_id is provided, group_entities_by_trait persists a trait_frequencies.parquet artifact. Pass the resource URI to calculate_trait_lift:

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}

Clients can pick up the artifact two ways — both reference the same file:

  • The workflow://…/trait_frequencies.parquet URI shown above (constructable directly from workflow_id).
  • The resourceLinks[0].uri returned in the tool response, fetched via the MCP resource protocol.

Usage Examples

Example 1: Inline entity IDs

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "domains": ["demographic", "affinity", "intent"]
}

Example 2: From entity_resolve output

{
  "entity_type": "person",
  "entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "entity_id_column": "entity_id",
  "domains": ["demographic", "affinity", "interest"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: Limited trait output

{
  "entity_type": "person",
  "entity_ids": ["123", "456"],
  "domains": ["demographic"],
  "trait_limit": 20
}

Group by geographic dimension

Passing "geo" in domains aggregates the audience along boundary types (state, zip5, county, dma, cbsa, msa, congressional_district). Each entity contributes one row per boundary it belongs to, so the resulting trait_frequencies answers "where does this audience concentrate."

geo composes with other domains in the same call — the tool runs the profile-enrichment pass and the boundary-bitmap pass in parallel and merges the results. A geo-only call skips profile enrichment entirely.

Example: group an audience by state and DMA

{
  "entity_type": "person",
  "entity_ids_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "domains": ["geo"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

The returned trait_frequencies rows use trait_name = <boundary_type> and trait_value = <boundary_value>:

{
  "trait_frequencies": [
    {
      "trait_hash": "<geo.state=CA hash>",
      "trait_name": "state",
      "trait_value": "CA",
      "domain": "geo",
      "audience_count": 1820,
      "audience_prevalence": 0.36
    },
    {
      "trait_hash": "<geo.dma=803 hash>",
      "trait_name": "dma",
      "trait_value": "803",
      "domain": "geo",
      "audience_count": 510,
      "audience_prevalence": 0.10
    }
  ]
}

Chain the parquet artifact into calculate_trait_lift to surface which states or DMAs over-index for the audience versus the national baseline.

geo is person-only. A business call with "geo" in domains is rejected at validation.

On this page