Watt Data

Compare audience trait frequencies against the world baseline to surface the traits that most define your audience.

Quick Example

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity being analyzed
audience_frequenciesarrayConditional-Array of trait_hash + prevalenceInline frequencies. Mutually exclusive with trait_frequencies_uri
trait_frequencies_uristringConditional-workflow:// URIParquet file from group_entities_by_trait. Mutually exclusive with audience_frequencies
top_nnumberNo151-100Number of top traits to return
include_under_representedbooleanNotrue-Include under-represented traits
audience_sizenumberNo-Integer >= 1Audience size for Bayesian shrinkage
workflow_idstringNo-Valid UUIDWorkflow ID for tracking and persistence

Parameter Details:

audience_frequencies vs trait_frequencies_uri:

  • Provide exactly one. They are mutually exclusive.
  • audience_frequencies for small inline datasets
  • trait_frequencies_uri for chaining from group_entities_by_trait output (recommended)

audience_frequencies format:

[
  { "trait_hash": "abc123", "audience_prevalence": 0.45 },
  { "trait_hash": "def456", "audience_prevalence": 0.32 }
]

Request Schema:

interface CalculateTraitLiftParams {
  entity_type: "person" | "business";
  audience_frequencies?: Array<{
    trait_hash: string;
    audience_prevalence: number;
  }>;
  trait_frequencies_uri?: string;
  top_n?: number;
  include_under_represented?: boolean;
  audience_size?: number;
  workflow_id?: string;
}

Output Format

{
  lift_scores: Array<{
    trait_hash: string;
    trait_name: string;
    trait_value: string;
    domain: string;
    audience_prevalence: number;
    world_prevalence: number;
    lift: number;
    under_represented: boolean;
  }>,
  trait_lookup_warning?: {
    code: string;  // currently the only emitted value is "TRAIT_LOOKUP_FAILURES"
    failed_count: number;
    total_count: number;
  },
  resourceLinks: Array<{
    uri: string;    // e.g. workflow://{workflow_id}/artifacts/lift_scores.parquet
    name: string;   // "lift_scores.parquet"
    mimeType: string; // "application/parquet"
  }>,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

FieldTypeDescription
lift_scoresarrayLift scores sorted by magnitude
lift_scores[].trait_hashstringStable trait hash
lift_scores[].trait_namestringTrait name
lift_scores[].trait_valuestringTrait value
lift_scores[].domainstringDomain category
lift_scores[].audience_prevalencenumberPrevalence in the audience (0-1)
lift_scores[].world_prevalencenumberPrevalence in the world baseline (0-1)
lift_scores[].liftnumberBayesian-shrunk lift score (posterior prevalence / world prevalence) — small audiences are pulled toward 1.0 to avoid spurious lift on tiny n. >1 = over-represented, <1 = under-represented
lift_scores[].under_representedbooleanWhether the trait is under-represented
trait_lookup_warningobjectWarning if some trait lookups failed
trait_lookup_warning.codestringStable warning code (currently the only emitted value is "TRAIT_LOOKUP_FAILURES")
resourceLinksarrayMCP resource links to persisted lift_scores.parquet. Populated when a workflow context is present; empty otherwise
resourceLinks[].uristringWorkflow resource URI (e.g. workflow://{workflow_id}/artifacts/lift_scores.parquet)
resourceLinks[].namestringArtifact filename (lift_scores.parquet)
resourceLinks[].mimeTypestringMIME type (application/parquet)

Example Response:

{
  "lift_scores": [
    {
      "trait_hash": "abc123def456",
      "trait_name": "golf_affinity",
      "trait_value": "high",
      "domain": "affinity",
      "audience_prevalence": 0.45,
      "world_prevalence": 0.12,
      "lift": 3.75,
      "under_represented": false
    },
    {
      "trait_hash": "ghi789jkl012",
      "trait_name": "income_range",
      "trait_value": "150000_plus",
      "domain": "demographic",
      "audience_prevalence": 0.32,
      "world_prevalence": 0.08,
      "lift": 4.0,
      "under_represented": false
    }
  ],
  "resourceLinks": [
    {
      "uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/lift_scores.parquet",
      "name": "lift_scores.parquet",
      "mimeType": "application/parquet"
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Understanding Lift Scores

  • Lift > 1: Trait is over-represented in your audience vs. the general population
  • Lift = 1: Trait prevalence matches the general population
  • Lift < 1: Trait is under-represented in your audience
  • Higher absolute lift values indicate more distinguishing traits

Common Errors

ConditionError message
Neither audience_frequencies nor trait_frequencies_uri provided"Provide exactly one input: audience_frequencies or trait_frequencies_uri"
domains contains a value not allowed for the given entity_type"Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>."

Usage Examples

Example 1: From group_entities_by_trait output (recommended)

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: Inline frequencies

{
  "entity_type": "person",
  "audience_frequencies": [
    { "trait_hash": "abc123", "audience_prevalence": 0.45 },
    { "trait_hash": "def456", "audience_prevalence": 0.32 },
    { "trait_hash": "ghi789", "audience_prevalence": 0.28 }
  ],
  "top_n": 10
}

Example 3: Without under-represented traits

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "include_under_represented": false,
  "top_n": 25
}

Geographic lift

Geo participates in calculate_trait_lift in two ways — both reuse the existing tool surface, no new parameters.

Geo traits in the input. A group_entities_by_trait run with "geo" in domains produces a trait_frequencies.parquet whose rows include boundary memberships (state, dma, county, etc.) alongside any other domains requested. Passing that parquet here ranks the boundaries that over-index — "this audience is concentrated in California and the Boston DMA" — in the same lift_scores array as the non-geo traits. Geo trait hashes resolve through the same lift pipeline; the world-prevalence lookup falls back to the geo catalog when a hash isn't in the main trait table, so geo entries don't surface TRAIT_LOOKUP_FAILURES.

Geo-scoped audiences. To answer "what's distinctive about my golfers in California", filter the audience to the region first (via entity_find with a geo trait hash, e.g. geo.state=CA AND interest.golf), then run group_entities_by_traitcalculate_trait_lift as usual. The lift is computed against the global world baseline; the audience is the CA cohort.

Example: which states over-index for an audience

{
  "entity_type": "person",
  "trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
  "top_n": 10
}

Where the upstream group_entities_by_trait call used domains: ["geo"]. Returned rows carry domain: "geo" and trait_name: "state" (or "dma", "county", etc., depending on which boundary types the aggregator hit).

Geo is person-only — entity_type: "business" rejects geo trait hashes at validation.

On this page