Find entities that match trait criteria and/or a geographic location filter, returning a sample of results with optional export.
Quick Example
{
"entity_type": "person",
"expression": "1000000001 AND 1000000002"
}Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| entity_type | string | Yes | - | "person" or "business" | Type of entity to search |
| expression | string | No | - | Boolean expression | Trait IDs/hashes with AND/OR/NOT operators |
| location | object | No | - | lat/lng/radius/unit | Geospatial filter |
| domains | array | No | ["email"] (person), ["name"] (business) | Max 5 strings | Domains to include in results. Only entities with at least one matching domain are returned |
| audience_limit | number | No | unlimited (format='none'); 200,000 (export formats) | 1 to 15,000,000 | Maximum entities to return. For counting queries (format='none'), omit to get the full match count. Lower default for exports keeps requests within the server cost budget |
| offset | number | No | 0 | >= 0 | Pagination offset. Requires workflow_id when > 0 |
| format | string | No | "none" | "none", "csv", "json", "jsonl" | Export format |
| max_identifiers | number | No | 3 | 1 to 10 | Max columns per identifier type in CSV export (e.g. email1..emailN) |
| workflow_id | string | No | - | Valid UUID | Required when offset > 0 for deterministic ordering |
Parameter Details:
expression:
- Boolean expression using trait IDs (numeric) or trait hashes (32-character lowercase hex strings)
- Supports:
AND,OR,NOT, parentheses for grouping - Mixing trait IDs and trait hashes is allowed
- Can be omitted for location-only queries
- Important: Discover trait hashes before use — never guess or fabricate them. Non-geo hashes: use
trait_search. Geo hashes (states, ZIPs, counties, DMAs, etc.): usetrait_get(domain="geo", trait_name=<type>, trait_value=<value>)or browsetrait://person?domain=geo. - Trait hashes must be exactly 32 lowercase hex characters (MD5 format), e.g.
e3b0c44298fc1c149afbf4c8996fb924
Expression syntax:
"123 AND 456" // Both traits
"(123 OR 456) AND NOT 789" // Either of two traits, excluding one
"e3b0c44298fc1c149afbf4c8996fb924 AND 27ae41e4649b934ca495991b7852b855" // Using trait hashes
"123 AND e3b0c44298fc1c149afbf4c8996fb924" // Mixing IDs and hasheslocation:
- Uses H3 resolution 9 (~0.2km edge length) for approximate radius matching
- Example:
{ "latitude": 37.7749, "longitude": -122.4194, "radius": 5, "unit": "km" }
domains:
- Controls which data domains are included in the results
- Defaults to
["email"]for persons,["name"]for businesses - Only entities with at least one matching domain are returned
- Maximum 5 domains per request
- Accepts identifier-kind (
name,email,phone,address,maid,website,social) and person trait-kind (affinity,content,demographic,employment,financial,household,intent,interest,lifestyle,political,purchase) and business trait-kind (about,appstore,digital,funding,hiring,industry,techstack) domains. Values outside this set are rejected with a validation error.
audience_limit:
- Defaults: unlimited when
format="none"(counting path —returned_countequals the full match count) and 200,000 for thecsv/json/jsonlexport formats. Hard maximum of 15,000,000 in either mode. - Results are ordered deterministically by
workflow_id, so the same query with the sameworkflow_idreturns the same sample across calls. - Export cost budget: non-
"none"formats enforceaudience_limit × (channels + 2 × enrichment_domains) ≤ 3,000,000, wherechannelsis the number of identifier-kind values indomainsandenrichment_domainsis the number of trait-kind values. Effective per-domain caps for contact-only exports: 1 domain → 3M, 2 → 1.5M, 3 → 1M, 4 → 750k, 5 → 600k. If a request exceeds the budget, the server rejects it with an error that includes a suggested loweraudience_limit. Retry with that value; do not change theexpressionto chase a smaller match count.
max_identifiers:
- Controls how many columns per identifier type appear in CSV exports (e.g.
email1..emailN,phone1..phoneN) - Default: 3, maximum: 10
- Only applies when
formatis"csv"
Request Schema:
interface EntityFindParams {
entity_type: "person" | "business";
expression?: string;
location?: {
latitude: number;
longitude: number;
radius: number;
unit: "km" | "miles";
};
domains?: string[];
audience_limit?: number;
offset?: number;
format?: "none" | "csv" | "json" | "jsonl";
max_identifiers?: number;
workflow_id?: string;
}Output Format
Success Response:
{
total: number,
returned_count: number,
sample: Array<{
entity_id: string;
email?: string;
phone?: string;
name?: string;
address?: string;
maid?: string;
}>,
export?: {
url: string;
format: string;
rows: number;
size_bytes?: number;
expires_at: string;
resource_uri: string;
},
has_more: boolean,
next_offset?: number,
tool_trace_id: string,
workflow_id: string
}Response Fields:
| Field | Type | Description |
|---|---|---|
| total | number | Total entities matching criteria |
| returned_count | number | Number of samples returned |
| sample | array | Sample records (default 10) |
| export | object | Export metadata (when format is csv/json/jsonl) |
| export.resource_uri | string | Workflow resource URI for the exported file |
| has_more | boolean | Whether more results exist beyond current page |
| next_offset | number | Offset for next page (when has_more is true) |
| tool_trace_id | string | OpenTelemetry trace ID |
| workflow_id | string | Workflow session identifier |
Example Response:
{
"total": 245000,
"returned_count": 10,
"sample": [
{
"entity_id": "123456",
"email": "alice@example.com"
},
{
"entity_id": "789012",
"email": "bob@example.com"
}
],
"has_more": true,
"next_offset": 10,
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Error Handling
Common Errors:
- Unknown trait hash in expression:
"Unknown cluster hash(es): <list>. For non-geo traits, use trait_search to discover valid hashes. For geo traits (states, ZIPs, counties, DMAs, etc.), use trait_get(domain=\"geo\", trait_name=<type>, trait_value=<value>) or read the trait://person?domain=geo resource." - No expression or location provided:
"At least one search criterion is required: provide either an expression or a location filter." - Invalid expression syntax (unexpected token):
"Unexpected token \"...\" in expression. Check syntax and operator usage" - Invalid expression syntax (unmatched parenthesis):
"Missing closing parenthesis in expression. Each opening \"(\" must have a matching \")\"" - offset > 0 without workflow_id:
"workflow_id is required when offset > 0 to ensure deterministic ordering across paginated requests."
Usage Examples
Example 1: Simple trait-based search
{
"entity_type": "person",
"expression": "1000000001 AND 1000000002"
}Example 2: Complex boolean with trait hashes
{
"entity_type": "person",
"expression": "(e3b0c44298fc1c149afbf4c8996fb924 OR 27ae41e4649b934ca495991b7852b855) AND NOT da39a3ee5e6b4b0d3255bfef95601890"
}Example 3: Location-only search
{
"entity_type": "person",
"location": {
"latitude": 40.7128,
"longitude": -74.0060,
"radius": 25,
"unit": "miles"
}
}Example 4: Combined trait + location with export
{
"entity_type": "person",
"expression": "1000000001 AND 1000000002",
"location": {
"latitude": 37.7749,
"longitude": -122.4194,
"radius": 50,
"unit": "km"
},
"domains": ["email", "phone", "name"],
"format": "csv"
}Example 5: Paginated results
{
"entity_type": "person",
"expression": "1000000001",
"offset": 100,
"audience_limit": 50,
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Named Geographies (geo traits)
entity_find accepts trait hashes from the geo domain alongside ordinary trait hashes. Geo trait hashes are not discoverable via trait_search — they come from two dedicated paths:
- Known boundary value:
trait_get(entity_type="person", domain="geo", trait_name=<type>, trait_value=<value>)— returns the hash for a specific boundary you already know (e.g., state=CA, zip5=94103). - Browse / discovery: read the MCP resource
trait://person?domain=geo[&trait_name=<type>&limit=<n>&offset=<n>]— pages through the full geo catalog. Add&trait_name=<type>to scope to one boundary type.
Geo applies to entity_type: "person" only.
Supported boundary types and value formats:
trait_name (boundary type) | trait_value format | Example |
|---|---|---|
state | USPS two-letter code | "CA" |
zip5 | 5-digit ZIP code | "94103" |
county | County name as stored | "Los Angeles" |
dma | Nielsen DMA numeric ID | "506" |
cbsa | 5-digit OMB code | "10100" |
msa | 2-4 digit OMB code | "1000" |
congressional_district | Two-digit district number (today; state-prefixed after a planned upstream rebuild) | "12" |
CBSA and MSA values are OMB numeric codes — not the structured "City1-City2, ST" name. For name→code lookup, see the Census Bureau CBSA delineation files.
Example 6: Audience in a state
// 1. trait_get { entity_type: "person", domain: "geo", trait_name: "state", trait_value: "CA" }
// → returns trait_hash for geo.state=CA
{
"entity_type": "person",
"expression": "<hash from trait_get>"
}Example 7: Geo + behavioral composition
geo.state=CA AND interest.golf — California residents who exhibit a golf affinity:
// 1. trait_get { ..., trait_name: "state", trait_value: "CA" } → geo.state=CA hash
// 2. trait_search { query: "golf interest" } → interest.golf hash
{
"entity_type": "person",
"expression": "<geo.state=CA hash> AND <interest.golf hash>"
}Example 8: Browse all states, then union / exclude
// Browse the full state catalog:
// resources/read trait://person?domain=geo&trait_name=state&limit=60
// CA or NY residents (hashes from trait_get for each state):
{
"entity_type": "person",
"expression": "<geo.state=CA hash> OR <geo.state=NY hash>"
}
// CA but NOT in the Sacramento DMA (DMA 862):
// trait_get { trait_name: "dma", trait_value: "862" } → Sacramento DMA hash
{
"entity_type": "person",
"expression": "<geo.state=CA hash> AND NOT <geo.dma=862 hash>"
}Boolean composition, NOT, pagination, and CSV/JSON/JSONL exports work identically for geo expressions.
Known limitation — congressional_district: until the upstream boundary index is rematerialised, district values are not yet state-prefixed. Querying geo.congressional_district=01 returns the union of district 01 across all states; use ZIP, county, or state targeting until this is fixed.
Known limitation — county: county values in the catalog are bare names with no state qualifier (e.g., "Jefferson" rather than "Jefferson, CO"). Calling trait_get(geo, county, "Jefferson") aggregates all Jefferson counties nationwide, not just the one in a specific state. This is an upstream data fidelity issue with the same root cause as the congressional_district limitation. Use state or zip5 targeting when state-specific county boundaries matter.