calculate_trait_lift — Watt Data Docs

Compare audience trait frequencies against the world baseline to surface the traits that most define your audience.

Quick Example

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
entity_type	string	Yes	-	"person" or "business"	Type of entity being analyzed
audience_frequencies	array	Conditional	-	Array of trait_hash + prevalence	Inline frequencies. Mutually exclusive with trait_frequencies_uri
trait_frequencies_uri	string	Conditional	-	workflow:// URI	Parquet file from group_entities_by_trait. Mutually exclusive with audience_frequencies
top_n	number	No	15	1-100	Number of top traits to return
include_under_represented	boolean	No	true	-	Include under-represented traits
audience_size	number	No	-	Integer >= 1	Audience size for Bayesian shrinkage
workflow_id	string	No	-	Valid UUID	Workflow ID for tracking and persistence

Parameter Details:

audience_frequencies vs trait_frequencies_uri:

Provide exactly one. They are mutually exclusive.
audience_frequencies for small inline datasets
trait_frequencies_uri for chaining from group_entities_by_trait output (recommended)

audience_frequencies format:

[
  { "trait_hash": "abc123", "audience_prevalence": 0.45 },
  { "trait_hash": "def456", "audience_prevalence": 0.32 }
]

Request Schema:

interface CalculateTraitLiftParams {
  entity_type: "person" | "business";
  audience_frequencies?: Array<{
    trait_hash: string;
    audience_prevalence: number;
  }>;
  trait_frequencies_uri?: string;
  top_n?: number;
  include_under_represented?: boolean;
  audience_size?: number;
  workflow_id?: string;
}

Output Format

{
  lift_scores: Array<{
    trait_hash: string;
    trait_name: string;
    trait_value: string;
    domain: string;
    audience_prevalence: number;
    world_prevalence: number;
    lift: number;
    under_represented: boolean;
  }>,
  trait_lookup_warning?: {
    code: string;  // currently the only emitted value is "TRAIT_LOOKUP_FAILURES"
    failed_count: number;
    total_count: number;
  },
  resourceLinks: Array<{
    uri: string;    // e.g. workflow://{workflow_id}/artifacts/lift_scores.parquet
    name: string;   // "lift_scores.parquet"
    mimeType: string; // "application/parquet"
  }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

Field	Type	Description
lift_scores	array	Lift scores sorted by magnitude
lift_scores[].trait_hash	string	Stable trait hash
lift_scores[].trait_name	string	Trait name
lift_scores[].trait_value	string	Trait value
lift_scores[].domain	string	Domain category
lift_scores[].audience_prevalence	number	Prevalence in the audience (0-1)
lift_scores[].world_prevalence	number	Prevalence in the world baseline (0-1)
lift_scores[].lift	number	Bayesian-shrunk lift score (posterior prevalence / world prevalence) — small audiences are pulled toward 1.0 to avoid spurious lift on tiny n. >1 = over-represented, <1 = under-represented
lift_scores[].under_represented	boolean	Whether the trait is under-represented
trait_lookup_warning	object	Warning if some trait lookups failed
trait_lookup_warning.code	string	Stable warning code (currently the only emitted value is `"TRAIT_LOOKUP_FAILURES"`)
resourceLinks	array	MCP resource links to persisted lift_scores.parquet. Populated when a workflow context is present; empty otherwise
resourceLinks[].uri	string	Workflow resource URI (e.g. `workflow://{workflow_id}/artifacts/lift_scores.parquet`)
resourceLinks[].name	string	Artifact filename (`lift_scores.parquet`)
resourceLinks[].mimeType	string	MIME type (`application/parquet`)

Example Response:

{
  "lift_scores": [
    {
      "trait_hash": "abc123def456",
      "trait_name": "golf_affinity",
      "trait_value": "high",
      "domain": "affinity",
      "audience_prevalence": 0.45,
      "world_prevalence": 0.12,
      "lift": 3.75,
      "under_represented": false
    },
    {
      "trait_hash": "ghi789jkl012",
      "trait_name": "income_range",
      "trait_value": "150000_plus",
      "domain": "demographic",
      "audience_prevalence": 0.32,
      "world_prevalence": 0.08,
      "lift": 4.0,
      "under_represented": false
    }
  ],
  "resourceLinks": [
    {
      "uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/lift_scores.parquet",
      "name": "lift_scores.parquet",
      "mimeType": "application/parquet"
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Understanding Lift Scores

Lift > 1: Trait is over-represented in your audience vs. the general population
Lift = 1: Trait prevalence matches the general population
Lift < 1: Trait is under-represented in your audience
Higher absolute lift values indicate more distinguishing traits

Common Errors

Condition	Error message
Neither `audience_frequencies` nor `trait_frequencies_uri` provided	`"Provide exactly one input: audience_frequencies or trait_frequencies_uri"`
`domains` contains a value not allowed for the given `entity_type`	`"Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>."`

Usage Examples

Example 1: From group_entities_by_trait output (recommended)

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: Inline frequencies

{
  "entity_type": "person",
  "audience_frequencies": [
    { "trait_hash": "abc123", "audience_prevalence": 0.45 },
    { "trait_hash": "def456", "audience_prevalence": 0.32 },
    { "trait_hash": "ghi789", "audience_prevalence": 0.28 }
  ],
  "top_n": 10
}

Example 3: Without under-represented traits

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "include_under_represented": false,
  "top_n": 25
}

Geographic lift

Geo participates in calculate_trait_lift in two ways — both reuse the existing tool surface, no new parameters.

Geo traits in the input. A group_entities_by_trait run with "geo" in domains produces a trait_frequencies.parquet whose rows include boundary memberships (state, dma, county, etc.) alongside any other domains requested. Passing that parquet here ranks the boundaries that over-index — "this audience is concentrated in California and the Boston DMA" — in the same lift_scores array as the non-geo traits. Geo trait hashes resolve through the same lift pipeline; the world-prevalence lookup falls back to the geo catalog when a hash isn't in the main trait table, so geo entries don't surface TRAIT_LOOKUP_FAILURES.

Geo-scoped audiences. To answer "what's distinctive about my golfers in California", filter the audience to the region first (via entity_find with a geo trait hash, e.g. geo.state=CA AND interest.golf), then run group_entities_by_trait → calculate_trait_lift as usual. The lift is computed against the global world baseline; the audience is the CA cohort.

Example: which states over-index for an audience

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "top_n": 10
}

Where the upstream group_entities_by_trait call used domains: ["geo"]. Returned rows carry domain: "geo" and trait_name: "state" (or "dma", "county", etc., depending on which boundary types the aggregator hit).

Geo is person-only — entity_type: "business" rejects geo trait hashes at validation.