entity_find — Watt Data Docs

Find entities that match trait criteria and/or a geographic location filter, returning a sample of results with optional export.

Quick Example

{
  "entity_type": "person",
  "expression": "1000000001 AND 1000000002"
}

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
entity_type	string	Yes	-	"person" or "business"	Type of entity to search
expression	string	No	-	Boolean expression	Trait IDs/hashes with AND/OR/NOT operators
location	object	No	-	lat/lng/radius/unit	Geospatial filter
domains	array	No	["email"] (person), ["name"] (business)	Max 5 strings	Domains to include in results. Only entities with at least one matching domain are returned
audience_limit	number	No	unlimited (format='none'); 200,000 (export formats)	1 to 15,000,000	Maximum entities to return. For counting queries (format='none'), omit to get the full match count. Lower default for exports keeps requests within the server cost budget
offset	number	No	0	>= 0	Pagination offset. Requires workflow_id when > 0
format	string	No	"none"	"none", "csv", "json", "jsonl"	Export format
max_identifiers	number	No	3	1 to 10	Max columns per identifier type in CSV export (e.g. email1..emailN)
workflow_id	string	No	-	Valid UUID	Required when offset > 0 for deterministic ordering

Parameter Details:

expression:

Boolean expression using trait IDs (numeric) or trait hashes (32-character lowercase hex strings)
Supports: AND, OR, NOT, parentheses for grouping
Mixing trait IDs and trait hashes is allowed
Can be omitted for location-only queries
Important: Discover trait hashes before use — never guess or fabricate them. Non-geo hashes: use trait_search. Geo hashes (states, ZIPs, counties, DMAs, etc.): use trait_get(domain="geo", trait_name=<type>, trait_value=<value>) or browse trait://person?domain=geo.
Trait hashes must be exactly 32 lowercase hex characters (MD5 format), e.g. e3b0c44298fc1c149afbf4c8996fb924

Expression syntax:

"123 AND 456"                                                          // Both traits
"(123 OR 456) AND NOT 789"                                            // Either of two traits, excluding one
"e3b0c44298fc1c149afbf4c8996fb924 AND 27ae41e4649b934ca495991b7852b855"  // Using trait hashes
"123 AND e3b0c44298fc1c149afbf4c8996fb924"                            // Mixing IDs and hashes

location:

Uses H3 resolution 9 (~0.2km edge length) for approximate radius matching

Example:

{
  "latitude": 37.7749,
  "longitude": -122.4194,
  "radius": 5,
  "unit": "km"
}

domains:

Controls which data domains are included in the results
Defaults to ["email"] for persons, ["name"] for businesses
Only entities with at least one matching domain are returned
Maximum 5 domains per request
Accepts identifier-kind (name, email, phone, address, maid, website, social) and person trait-kind (affinity, content, demographic, employment, financial, household, intent, interest, lifestyle, political, purchase) and business trait-kind (about, appstore, digital, funding, hiring, industry, techstack) domains. Values outside this set are rejected with a validation error.

audience_limit:

Defaults: unlimited when format="none" (counting path — returned_count equals the full match count) and 200,000 for the csv / json / jsonl export formats. Hard maximum of 15,000,000 in either mode.
Results are ordered deterministically by workflow_id, so the same query with the same workflow_id returns the same sample across calls.
Export cost budget: non-"none" formats enforce audience_limit × (channels + 2 × enrichment_domains) ≤ 3,000,000, where channels is the number of identifier-kind values in domains and enrichment_domains is the number of trait-kind values. Effective per-domain caps for contact-only exports: 1 domain → 3M, 2 → 1.5M, 3 → 1M, 4 → 750k, 5 → 600k. If a request exceeds the budget, the server rejects it with an error that includes a suggested lower audience_limit. Retry with that value; do not change the expression to chase a smaller match count.

max_identifiers:

Controls how many columns per identifier type appear in CSV exports (e.g. email1..emailN, phone1..phoneN)
Default: 3, maximum: 10
Only applies when format is "csv"

Request Schema:

interface EntityFindParams {
  entity_type: "person" | "business";
  expression?: string;
  location?: {
    latitude: number;
    longitude: number;
    radius: number;
    unit: "km" | "miles";
  };
  domains?: string[];
  audience_limit?: number;
  offset?: number;
  format?: "none" | "csv" | "json" | "jsonl";
  max_identifiers?: number;
  workflow_id?: string;
}

Output Format

Success Response:

{
  total: number,
  returned_count: number,
  sample: Array<{
    entity_id: string;
    email?: string;
    phone?: string;
    name?: string;
    address?: string;
    maid?: string;
  }>,
  export?: {
    url: string;
    format: string;
    rows: number;
    size_bytes?: number;
    expires_at: string;
    resource_uri: string;
  },
  has_more: boolean,
  next_offset?: number,
  tool_trace_id: string,
  workflow_id: string
}

Response Fields:

Field	Type	Description
total	number	Total entities matching criteria
returned_count	number	Number of samples returned
sample	array	Sample records (default 10)
export	object	Export metadata (when format is csv/json/jsonl)
export.resource_uri	string	Workflow resource URI for the exported file
has_more	boolean	Whether more results exist beyond current page
next_offset	number	Offset for next page (when has_more is true)
tool_trace_id	string	OpenTelemetry trace ID
workflow_id	string	Workflow session identifier

Example Response:

{
  "total": 245000,
  "returned_count": 10,
  "sample": [
    {
      "entity_id": "123456",
      "email": "alice@example.com"
    },
    {
      "entity_id": "789012",
      "email": "bob@example.com"
    }
  ],
  "has_more": true,
  "next_offset": 10,
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

Unknown trait hash in expression: "Unknown cluster hash(es): <list>. For non-geo traits, use trait_search to discover valid hashes. For geo traits (states, ZIPs, counties, DMAs, etc.), use trait_get(domain=\"geo\", trait_name=<type>, trait_value=<value>) or read the trait://person?domain=geo resource."
No expression or location provided: "At least one search criterion is required: provide either an expression or a location filter."
Invalid expression syntax (unexpected token): "Unexpected token \"...\" in expression. Check syntax and operator usage"
Invalid expression syntax (unmatched parenthesis): "Missing closing parenthesis in expression. Each opening \"(\" must have a matching \")\""
offset > 0 without workflow_id: "workflow_id is required when offset > 0 to ensure deterministic ordering across paginated requests."

Usage Examples

Example 1: Simple trait-based search

{
  "entity_type": "person",
  "expression": "1000000001 AND 1000000002"
}

Example 2: Complex boolean with trait hashes

{
  "entity_type": "person",
  "expression": "(e3b0c44298fc1c149afbf4c8996fb924 OR 27ae41e4649b934ca495991b7852b855) AND NOT da39a3ee5e6b4b0d3255bfef95601890"
}

Example 3: Location-only search

{
  "entity_type": "person",
  "location": {
    "latitude": 40.7128,
    "longitude": -74.0060,
    "radius": 25,
    "unit": "miles"
  }
}

Example 4: Combined trait + location with export

{
  "entity_type": "person",
  "expression": "1000000001 AND 1000000002",
  "location": {
    "latitude": 37.7749,
    "longitude": -122.4194,
    "radius": 50,
    "unit": "km"
  },
  "domains": ["email", "phone", "name"],
  "format": "csv"
}

Example 5: Paginated results

{
  "entity_type": "person",
  "expression": "1000000001",
  "offset": 100,
  "audience_limit": 50,
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Named Geographies (geo traits)

entity_find accepts trait hashes from the geo domain alongside ordinary trait hashes. Geo trait hashes are not discoverable via trait_search — they come from two dedicated paths:

Known boundary value: trait_get(entity_type="person", domain="geo", trait_name=<type>, trait_value=<value>) — returns the hash for a specific boundary you already know (e.g., state=CA, zip5=94103).
Browse / discovery: read the MCP resource trait://person?domain=geo[&trait_name=<type>&limit=<n>&offset=<n>] — pages through the full geo catalog. Add &trait_name=<type> to scope to one boundary type.

Geo applies to entity_type: "person" only.

Supported boundary types and value formats:

`trait_name` (boundary type)	`trait_value` format	Example
`state`	USPS two-letter code	`"CA"`
`zip5`	5-digit ZIP code	`"94103"`
`county`	County name as stored	`"Los Angeles"`
`dma`	Nielsen DMA numeric ID	`"506"`
`cbsa`	5-digit OMB code	`"10100"`
`msa`	2-4 digit OMB code	`"1000"`
`congressional_district`	Two-digit district number (today; state-prefixed after a planned upstream rebuild)	`"12"`

CBSA and MSA values are OMB numeric codes — not the structured "City1-City2, ST" name. For name→code lookup, see the Census Bureau CBSA delineation files.

Example 6: Audience in a state

// 1. trait_get { entity_type: "person", domain: "geo", trait_name: "state", trait_value: "CA" }
//    → returns trait_hash for geo.state=CA
{
  "entity_type": "person",
  "expression": "<hash from trait_get>"
}

Example 7: Geo + behavioral composition

geo.state=CA AND interest.golf — California residents who exhibit a golf affinity:

// 1. trait_get { ..., trait_name: "state", trait_value: "CA" } → geo.state=CA hash
// 2. trait_search { query: "golf interest" } → interest.golf hash
{
  "entity_type": "person",
  "expression": "<geo.state=CA hash> AND <interest.golf hash>"
}

Example 8: Browse all states, then union / exclude

// Browse the full state catalog:
// resources/read trait://person?domain=geo&trait_name=state&limit=60

// CA or NY residents (hashes from trait_get for each state):
{
  "entity_type": "person",
  "expression": "<geo.state=CA hash> OR <geo.state=NY hash>"
}

// CA but NOT in the Sacramento DMA (DMA 862):
// trait_get { trait_name: "dma", trait_value: "862" } → Sacramento DMA hash
{
  "entity_type": "person",
  "expression": "<geo.state=CA hash> AND NOT <geo.dma=862 hash>"
}

Boolean composition, NOT, pagination, and CSV/JSON/JSONL exports work identically for geo expressions.

Known limitation — congressional_district: until the upstream boundary index is rematerialised, district values are not yet state-prefixed. Querying geo.congressional_district=01 returns the union of district 01 across all states; use ZIP, county, or state targeting until this is fixed.

Known limitation — county: county values in the catalog are bare names with no state qualifier (e.g., "Jefferson" rather than "Jefferson, CO"). Calling trait_get(geo, county, "Jefferson") aggregates all Jefferson counties nationwide, not just the one in a specific state. This is an upstream data fidelity issue with the same root cause as the congressional_district limitation. Use state or zip5 targeting when state-specific county boundaries matter.