Watt Data

Check trait membership for a set of entities against one or more labeled trait expressions. Returns a per-entity boolean matrix where 1 means the entity matches the expression and 0 means it does not. Uses the same boolean expression syntax as entity_find.

Quick Example

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "high_income_golf_fans", "expression": "6003037 AND 2000007" }
  ]
}

Input Parameters

ParameterTypeRequiredDefaultConstraintsDescription
entity_typestringYes-"person" or "business"Type of entity being checked
entity_idsarrayConditional-Array of strings or integers, max 1000Entity IDs (inline mode). Mutually exclusive with csv_resource_uri
csv_resource_uristringConditional-workflow:// URICSV or Parquet file containing entity IDs. Mutually exclusive with entity_ids
entity_id_columnstringNo"entity_id"Column nameColumn containing entity IDs (only with csv_resource_uri)
expressionsarrayYes-1-100 entries of { label, expression }Labeled boolean trait expressions to evaluate per entity
include_unmatchedbooleanNotrue-If false, drop entities that match no expression
include_identifiersarrayNo-Subset of name, email, phone, address, maidIdentifier columns to include alongside entity_id
formatstringNo"none""none", "csv", "json", "jsonl"Export format. Non-none values produce a presigned download URL valid for 1 hour
workflow_idstringConditional-Valid UUIDRequired when format is not "none"

Parameter Details:

entity_ids vs csv_resource_uri:

  • Provide exactly one. They are mutually exclusive.
  • entity_ids for small inline batches (capped at 1000 IDs)
  • csv_resource_uri for chaining from entity_resolve or entity_find output (recommended for larger sets); bounded at 200,000 entity IDs
  • csv_resource_uri supports both .csv and .parquet files

expressions:

  • Each entry is { "label": "<column-name>", "expression": "<boolean-expression>" }
  • label becomes a column name in the output matrix; labels must be unique
  • expression uses the same boolean syntax as entity_find
  • 1-100 expressions per call

Boolean expression syntax (same as entity_find):

  • Trait IDs (numeric) or trait hashes (alphanumeric)
  • Supports AND, OR, NOT, parentheses for grouping
  • Mixing trait IDs and trait hashes is allowed
  • Discover valid trait IDs via trait_search or browse trait:// resources
"6003037"                                  // Single trait
"6003037 AND 2000007"                      // Both traits
"(6003034 OR 6003037) AND NOT 2000012"     // Grouped boolean
"abc123 AND def456"                        // Trait hashes

include_identifiers:

  • Adds the entity's primary identifier of each requested type to the output row (e.g., email1, phone1)
  • Useful for re-joining the matrix with the original input file

format and workflow_id:

  • format: "none" (default) returns an inline summary plus a 10-row sample
  • format: "csv" | "json" | "jsonl" streams the full matrix to S3 and returns a presigned export.url valid for 1 hour, plus a workflow:// resource URI for downstream tools
  • A workflow_id is required whenever format is not "none"

Request Schema:

interface EntityTraitsParams {
  entity_type: "person" | "business";
  entity_ids?: Array<string | number>;
  csv_resource_uri?: string;
  entity_id_column?: string;
  expressions: Array<{
    label: string;
    expression: string;
  }>;
  include_unmatched?: boolean;
  include_identifiers?: Array<"name" | "email" | "phone" | "address" | "maid">;
  format?: "none" | "csv" | "json" | "jsonl";
  workflow_id?: string;  // Valid UUID; required when format is not "none"
}

Output Format

{
  total_entities: number;
  matched_entities: number;
  expression_counts: Record<string, number>;
  sample: Array<{
    entity_id: string;
    [identifier: string]: string | number;  // e.g., email1, phone1 when include_identifiers is set
    [label: string]: 0 | 1;                 // one column per expression label
  }>;
  export?: {
    url: string;
    format: "csv" | "json" | "jsonl";
    rows: number;
    expires_at: string;
    resource_uri: string;
  };
  tool_trace_id: string;
  workflow_id: string;
}

Response Fields:

FieldTypeDescription
total_entitiesnumberNumber of input entities scored (after include_unmatched filtering)
matched_entitiesnumberEntities matching at least one expression
expression_countsobjectPer-label match count, keyed by expression label
samplearrayUp to 10 matrix rows: entity_id, optional identifier columns, one 0/1 column per expression label
exportobjectPresent only when format is csv, json, or jsonl
export.urlstringPresigned download URL, valid for 1 hour
export.resource_uristringworkflow:// URI for chaining into downstream tools
export.rowsnumberTotal rows written to the export
export.expires_atstringISO-8601 expiry of the presigned URL
tool_trace_idstringOpenTelemetry trace ID
workflow_idstringWorkflow session identifier

Example Response:

{
  "total_entities": 500,
  "matched_entities": 312,
  "expression_counts": {
    "golf_fans": 245,
    "high_income_golf_fans": 132
  },
  "sample": [
    {
      "entity_id": "123456",
      "email1": "alice@example.com",
      "golf_fans": 1,
      "high_income_golf_fans": 1
    },
    {
      "entity_id": "789012",
      "email1": "bob@example.com",
      "golf_fans": 1,
      "high_income_golf_fans": 0
    }
  ],
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

  • Both entity_ids and csv_resource_uri provided: "entity_ids and csv_resource_uri are mutually exclusive"
  • Neither provided: "Either entity_ids or csv_resource_uri must be provided"
  • csv_resource_uri does not end in .csv or .parquet: "csv_resource_uri must point to a .csv or .parquet file, got: <uri>"
  • Duplicate label in expressions: "Duplicate expression label: \"<label>\""
  • format set without workflow_id: "workflow_id is required when format is not 'none'. Provide a workflow_id to enable export."
  • Unknown cluster hash(es) in an expression: "Unknown cluster hash(es): <hash-list>. Use trait_search to discover valid trait hashes before building expressions."
  • Invalid cluster identifier token (not a numeric ID or 32-character hex hash): "Invalid cluster identifier \"<token>\". Must be a numeric cluster ID or a 32-character hex cluster hash"

Usage Examples

Example 1: Inline entity IDs with two expressions

{
  "entity_type": "person",
  "entity_ids": ["123", "456", "789"],
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "high_income_golf_fans", "expression": "6003037 AND 2000007" }
  ]
}

Example 2: Chained from entity_find via csv_resource_uri (CSV export)

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/audience.csv",
  "entity_id_column": "entity_id",
  "expressions": [
    { "label": "golf_fans", "expression": "6003037" },
    { "label": "luxury_travel", "expression": "(6003034 OR 6003037) AND NOT 2000012" }
  ],
  "include_identifiers": ["email", "phone"],
  "format": "csv",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: Chained from entity_resolve, only matched entities

{
  "entity_type": "person",
  "csv_resource_uri": "workflow://550e8400-e29b-41d4-a716-446655440000/artifacts/resolved_identities.parquet",
  "entity_id_column": "entity_id",
  "expressions": [
    { "label": "high_value", "expression": "6003037 AND 2000007" }
  ],
  "include_unmatched": false,
  "format": "jsonl",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

On this page