Compare audience trait frequencies against the world baseline to surface the traits that most define your audience.
Quick Example
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet"
}Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| entity_type | string | Yes | - | "person" or "business" | Type of entity being analyzed |
| audience_frequencies | array | Conditional | - | Array of trait_hash + prevalence | Inline frequencies. Mutually exclusive with trait_frequencies_uri |
| trait_frequencies_uri | string | Conditional | - | workflow:// URI | Parquet file from group_entities_by_trait. Mutually exclusive with audience_frequencies |
| top_n | number | No | 15 | 1-100 | Number of top traits to return |
| include_under_represented | boolean | No | true | - | Include under-represented traits |
| audience_size | number | No | - | Integer >= 1 | Audience size for Bayesian shrinkage |
| workflow_id | string | No | - | Valid UUID | Workflow ID for tracking and persistence |
Parameter Details:
audience_frequencies vs trait_frequencies_uri:
- Provide exactly one. They are mutually exclusive.
audience_frequenciesfor small inline datasetstrait_frequencies_urifor chaining fromgroup_entities_by_traitoutput (recommended)
audience_frequencies format:
[
{ "trait_hash": "abc123", "audience_prevalence": 0.45 },
{ "trait_hash": "def456", "audience_prevalence": 0.32 }
]Request Schema:
interface CalculateTraitLiftParams {
entity_type: "person" | "business";
audience_frequencies?: Array<{
trait_hash: string;
audience_prevalence: number;
}>;
trait_frequencies_uri?: string;
top_n?: number;
include_under_represented?: boolean;
audience_size?: number;
workflow_id?: string;
}Output Format
{
lift_scores: Array<{
trait_hash: string;
trait_name: string;
trait_value: string;
domain: string;
audience_prevalence: number;
world_prevalence: number;
lift: number;
under_represented: boolean;
}>,
trait_lookup_warning?: {
code: string; // currently the only emitted value is "TRAIT_LOOKUP_FAILURES"
failed_count: number;
total_count: number;
},
resourceLinks: Array<{
uri: string; // e.g. workflow://{workflow_id}/artifacts/lift_scores.parquet
name: string; // "lift_scores.parquet"
mimeType: string; // "application/parquet"
}>,
tool_trace_id: string,
workflow_id: string
}Response Fields:
| Field | Type | Description |
|---|---|---|
| lift_scores | array | Lift scores sorted by magnitude |
| lift_scores[].trait_hash | string | Stable trait hash |
| lift_scores[].trait_name | string | Trait name |
| lift_scores[].trait_value | string | Trait value |
| lift_scores[].domain | string | Domain category |
| lift_scores[].audience_prevalence | number | Prevalence in the audience (0-1) |
| lift_scores[].world_prevalence | number | Prevalence in the world baseline (0-1) |
| lift_scores[].lift | number | Bayesian-shrunk lift score (posterior prevalence / world prevalence) — small audiences are pulled toward 1.0 to avoid spurious lift on tiny n. >1 = over-represented, <1 = under-represented |
| lift_scores[].under_represented | boolean | Whether the trait is under-represented |
| trait_lookup_warning | object | Warning if some trait lookups failed |
| trait_lookup_warning.code | string | Stable warning code (currently the only emitted value is "TRAIT_LOOKUP_FAILURES") |
| resourceLinks | array | MCP resource links to persisted lift_scores.parquet. Populated when a workflow context is present; empty otherwise |
| resourceLinks[].uri | string | Workflow resource URI (e.g. workflow://{workflow_id}/artifacts/lift_scores.parquet) |
| resourceLinks[].name | string | Artifact filename (lift_scores.parquet) |
| resourceLinks[].mimeType | string | MIME type (application/parquet) |
Example Response:
{
"lift_scores": [
{
"trait_hash": "abc123def456",
"trait_name": "golf_affinity",
"trait_value": "high",
"domain": "affinity",
"audience_prevalence": 0.45,
"world_prevalence": 0.12,
"lift": 3.75,
"under_represented": false
},
{
"trait_hash": "ghi789jkl012",
"trait_name": "income_range",
"trait_value": "150000_plus",
"domain": "demographic",
"audience_prevalence": 0.32,
"world_prevalence": 0.08,
"lift": 4.0,
"under_represented": false
}
],
"resourceLinks": [
{
"uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/lift_scores.parquet",
"name": "lift_scores.parquet",
"mimeType": "application/parquet"
}
],
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Understanding Lift Scores
- Lift > 1: Trait is over-represented in your audience vs. the general population
- Lift = 1: Trait prevalence matches the general population
- Lift < 1: Trait is under-represented in your audience
- Higher absolute lift values indicate more distinguishing traits
Common Errors
| Condition | Error message |
|---|---|
Neither audience_frequencies nor trait_frequencies_uri provided | "Provide exactly one input: audience_frequencies or trait_frequencies_uri" |
domains contains a value not allowed for the given entity_type | "Trait domains not allowed for entity_type='<entityType>'. Allowed: <allowed>. Violations: <violations>." |
Usage Examples
Example 1: From group_entities_by_trait output (recommended)
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 2: Inline frequencies
{
"entity_type": "person",
"audience_frequencies": [
{ "trait_hash": "abc123", "audience_prevalence": 0.45 },
{ "trait_hash": "def456", "audience_prevalence": 0.32 },
{ "trait_hash": "ghi789", "audience_prevalence": 0.28 }
],
"top_n": 10
}Example 3: Without under-represented traits
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
"include_under_represented": false,
"top_n": 25
}Geographic lift
Geo participates in calculate_trait_lift in two ways — both reuse the existing tool surface, no new parameters.
Geo traits in the input. A group_entities_by_trait run with "geo" in domains produces a trait_frequencies.parquet whose rows include boundary memberships (state, dma, county, etc.) alongside any other domains requested. Passing that parquet here ranks the boundaries that over-index — "this audience is concentrated in California and the Boston DMA" — in the same lift_scores array as the non-geo traits. Geo trait hashes resolve through the same lift pipeline; the world-prevalence lookup falls back to the geo catalog when a hash isn't in the main trait table, so geo entries don't surface TRAIT_LOOKUP_FAILURES.
Geo-scoped audiences. To answer "what's distinctive about my golfers in California", filter the audience to the region first (via entity_find with a geo trait hash, e.g. geo.state=CA AND interest.golf), then run group_entities_by_trait → calculate_trait_lift as usual. The lift is computed against the global world baseline; the audience is the CA cohort.
Example: which states over-index for an audience
{
"entity_type": "person",
"trait_frequencies_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/trait_frequencies.parquet",
"top_n": 10
}Where the upstream group_entities_by_trait call used domains: ["geo"]. Returned rows carry domain: "geo" and trait_name: "state" (or "dma", "county", etc., depending on which boundary types the aggregator hit).
Geo is person-only — entity_type: "business" rejects geo trait hashes at validation.