entity_traits — Watt Data Docs

Check trait membership for a set of entities against one or more labeled trait expressions. Returns a per-entity boolean matrix where 1 means the entity matches the expression and 0 means it does not. Uses the same boolean expression syntax as entity_find.

Quick Example

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "high_income_golf_fans", "expression": "6003037 AND 2000007" }
  ]
}

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
entity_type	string	Yes	-	"person" or "business"	Type of entity being checked
entity_ids	array	Conditional	-	Array of strings or integers, max 1000	Entity IDs (inline mode). Mutually exclusive with csv_resource_uri
csv_resource_uri	string	Conditional	-	workflow:// URI	CSV or Parquet file containing entity IDs. Mutually exclusive with entity_ids
entity_id_column	string	No	"entity_id"	Column name	Column containing entity IDs (only with csv_resource_uri)
expressions	array	Yes	-	1-100 entries of `{ label, expression }`	Labeled boolean trait expressions to evaluate per entity
include_unmatched	boolean	No	true	-	If false, drop entities that match no expression
include_identifiers	array	No	-	Subset of `name`, `email`, `phone`, `address`, `maid`	Identifier columns to include alongside entity_id
format	string	No	"none"	"none", "csv", "json", "jsonl"	Export format. Non-`none` values produce a presigned download URL valid for 1 hour
workflow_id	string	Conditional	-	Valid UUID	Required when format is not "none"

Parameter Details:

entity_ids vs csv_resource_uri:

Provide exactly one. They are mutually exclusive.
entity_ids for small inline batches (capped at 1000 IDs)
csv_resource_uri for chaining from entity_resolve or entity_find output (recommended for larger sets); bounded at 200,000 entity IDs
csv_resource_uri supports both .csv and .parquet files

expressions:

Each entry is { "label": "<column-name>", "expression": "<boolean-expression>" }
label becomes a column name in the output matrix; labels must be unique
expression uses the same boolean syntax as entity_find
1-100 expressions per call

Boolean expression syntax (same as entity_find):

Trait IDs (numeric) or trait hashes (alphanumeric)
Supports AND, OR, NOT, parentheses for grouping
Mixing trait IDs and trait hashes is allowed
Discover valid trait IDs via trait_search or browse trait:// resources

"6003037"                                  // Single trait
"6003037 AND 2000007"                      // Both traits
"(6003034 OR 6003037) AND NOT 2000012"     // Grouped boolean
"abc123 AND def456"                        // Trait hashes

include_identifiers:

Adds the entity's primary identifier of each requested type to the output row (e.g., email1, phone1)
Useful for re-joining the matrix with the original input file

format and workflow_id:

format: "none" (default) returns an inline summary plus a 10-row sample
format: "csv" | "json" | "jsonl" streams the full matrix to S3 and returns a presigned export.url valid for 1 hour, plus a workflow:// resource URI for downstream tools
A workflow_id is required whenever format is not "none"

Request Schema:

interface EntityTraitsParams {
  entity_type: "person" | "business";
  entity_ids?: Array<string | number>;
  csv_resource_uri?: string;
  entity_id_column?: string;
  expressions: Array<{
    label: string;
    expression: string;
  }>;
  include_unmatched?: boolean;
  include_identifiers?: Array<"name" | "email" | "phone" | "address" | "maid">;
  format?: "none" | "csv" | "json" | "jsonl";
  workflow_id?: string;  // Valid UUID; required when format is not "none"
}

Output Format

{
  total_entities: number;
  matched_entities: number;
  expression_counts: Record<string, number>;
  sample: Array<{
    entity_id: string;
    [identifier: string]: string | number;  // e.g., email1, phone1 when include_identifiers is set
    [label: string]: 0 | 1;                 // one column per expression label
  }>;
  export?: {
    url: string;
    format: "csv" | "json" | "jsonl";
    rows: number;
    expires_at: string;
    resource_uri: string;
  };
  tool_trace_id: string;
  workflow_id: string;
}

Response Fields:

Field	Type	Description
total_entities	number	Number of input entities scored (after `include_unmatched` filtering)
matched_entities	number	Entities matching at least one expression
expression_counts	object	Per-label match count, keyed by expression label
sample	array	Up to 10 matrix rows: `entity_id`, optional identifier columns, one `0`/`1` column per expression label
export	object	Present only when `format` is `csv`, `json`, or `jsonl`
export.url	string	Presigned download URL, valid for 1 hour
export.resource_uri	string	`workflow://` URI for chaining into downstream tools
export.rows	number	Total rows written to the export
export.expires_at	string	ISO-8601 expiry of the presigned URL
tool_trace_id	string	OpenTelemetry trace ID
workflow_id	string	Workflow session identifier

Example Response:

{
  "total_entities": 500,
  "matched_entities": 312,
  "expression_counts": {
    "golf_fans": 245,
    "high_income_golf_fans": 132
  },
  "sample": [
    {
      "entity_id": "123456",
      "email1": "alice@example.com",
      "golf_fans": 1,
      "high_income_golf_fans": 1
    },
    {
      "entity_id": "789012",
      "email1": "bob@example.com",
      "golf_fans": 1,
      "high_income_golf_fans": 0
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

Both entity_ids and csv_resource_uri provided: "entity_ids and csv_resource_uri are mutually exclusive"
Neither provided: "Either entity_ids or csv_resource_uri must be provided"
csv_resource_uri does not end in .csv or .parquet: "csv_resource_uri must point to a .csv or .parquet file, got: <uri>"
Duplicate label in expressions: "Duplicate expression label: \"<label>\""
format set without workflow_id: "workflow_id is required when format is not 'none'. Provide a workflow_id to enable export."
Unknown cluster hash(es) in an expression: "Unknown cluster hash(es): <hash-list>. Use trait_search to discover valid trait hashes before building expressions."
Invalid cluster identifier token (not a numeric ID or 32-character hex hash): "Invalid cluster identifier \"<token>\". Must be a numeric cluster ID or a 32-character hex cluster hash"

Usage Examples

Example 1: Inline entity IDs with two expressions

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "high_income_golf_fans", "expression": "6003037 AND 2000007" }
  ]
}

Example 2: Chained from entity_find via csv_resource_uri (CSV export)

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/audience.csv",
  "entity_id_column": "entity_id",
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "luxury_travel", "expression": "(6003034 OR 6003037) AND NOT 2000012" }
  ],
  "include_identifiers": ["email", "phone"],
  "format": "csv",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: Chained from entity_resolve, only matched entities

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "entity_id_column": "entity_id",
  "expressions": [
    { "label": "high_value", "expression": "6003037 AND 2000007" }
  ],
  "include_unmatched": false,
  "format": "jsonl",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Quick Example

Input Parameters

Output Format

Error Handling

Usage Examples

On this page