analyze_customers
Description: Analyzes customer data to generate ICP (Ideal Customer Profile) insights. Takes a CSV key from generate_url_for_upload and performs schema detection, identity resolution, intelligent domain selection, profile enrichment, cluster analysis, and ICP synthesis with personas.
The domain selection step uses an LLM to analyze the market context and select the most relevant enrichment domains (3-5 domains) in addition to demographic which is always included as a baseline. This ensures the analysis focuses on the most meaningful data for your specific market context.
Tool Identifier: analyze_customers
Input Parameters
| Parameter | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| csv_key | string | Yes | - | Min length: 1 | S3 key from generate_url_for_upload |
| market_context | object | Yes | - | See below | Business context with iteration support |
| email_columns | string | string[] | No | null | - | Override auto-detected email column(s) |
| phone_columns | string | string[] | No | null | - | Override auto-detected phone column(s) |
| address_columns | string | string[] | No | null | - | Override auto-detected address column(s) |
| workflow_id | string | Yes | - | Valid UUID | Workflow session identifier from generate_url_for_upload |
| offset | number | No | 0 | Integer ≥ 0 | Number of CSV data rows to skip before processing. Use with limit to paginate large CSVs across multiple calls |
| limit | number | No | 40000 | 1 ≤ limit ≤ 40000 | Maximum number of CSV data rows to process in this call. To process larger files, call repeatedly with increasing offset values |
market_context Object:
| Field | Type | Required | Description |
|---|---|---|---|
| base_context | string | Yes | Initial market/product description |
| refinements | array | No | Accumulated iteration prompts (default: []) |
Request Schema:
interface AnalyzeCustomersParams {
csv_key: string;
market_context: {
base_context: string;
refinements?: string[];
};
email_columns?: string | string[] | null;
phone_columns?: string | string[] | null;
address_columns?: string | string[] | null;
workflow_id: string;
offset?: number;
limit?: number;
}Output Format
Success Response:
{
detected_schema: {
email_columns: string[];
phone_columns: string[];
address_columns: string[];
confidence: {
email: number; // 0-1
phone: number; // 0-1
address: number; // 0-1
};
all_columns: string[];
sample_data: Record<string, string>[]; // First 3 rows
};
resolution: {
total_identifiers: number;
resolved: number;
resolution_rate: number; // 0-1
warning?: {
code: string;
message: string;
suggestions: string[];
};
};
selected_domains: {
domains: string[]; // Always includes 'demographic' plus 3-5 LLM-selected domains
reasoning: string; // LLM explanation of why these domains were selected
};
cluster_lookup_warning?: {
code: string;
message: string;
details?: any;
};
clusters: Array<{
cluster_hash: string;
name: string;
value: string;
domain: string;
audience_prevalence: number;
world_prevalence: number;
lift: number;
under_represented: boolean;
weighted_score: number;
}>; // Top 15 clusters by weighted score
icp_synthesis: {
personas: Array<{
name: string;
description: string;
key_traits: Array<{
cluster_hash: string;
domain: string;
name: string;
value: string;
lift: number;
insight: string;
}>;
expressions: {
precision: string; // AND-heavy expression
balanced: string; // Mixed AND/OR
reach: string; // OR-heavy expression
};
}>;
icp_analysis: {
summary: string;
top_distinctive_traits: Array<{
cluster_hash: string;
domain: string;
name: string;
value: string;
audience_percent: number;
world_percent: number;
lift: number;
}>;
exclusion_signals: Array<{
cluster_hash: string;
domain: string;
name: string;
value: string;
inverse_lift: number;
}>;
};
};
tool_trace_id: string;
workflow_id: string;
}Example Response:
{
"detected_schema": {
"email_columns": ["email"],
"phone_columns": ["phone"],
"address_columns": [],
"confidence": { "email": 0.95, "phone": 0.85, "address": 0 },
"all_columns": ["email", "phone", "name", "company"],
"sample_data": [
{ "email": "alice@example.com", "phone": "555-0001", "name": "Alice", "company": "Acme" }
]
},
"resolution": {
"total_identifiers": 500,
"resolved": 425,
"resolution_rate": 0.85
},
"selected_domains": {
"domains": ["intent", "affinity", "financial", "employment", "demographic"],
"reasoning": "For B2B SaaS marketing automation, intent signals indicate purchase readiness, affinity reveals brand preferences, financial and employment data help qualify company fit, and demographic data provides decision-maker context."
},
"clusters": [
{
"cluster_hash": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d",
"name": "tech_affinity",
"value": "high",
"domain": "affinity",
"audience_prevalence": 0.45,
"world_prevalence": 0.12,
"lift": 3.75,
"under_represented": false,
"weighted_score": 2.51
}
],
"icp_synthesis": {
"personas": [
{
"name": "Tech-Forward Professional",
"description": "High-income professionals with strong technology affinity.",
"key_traits": [
{
"cluster_hash": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d",
"domain": "affinity",
"name": "tech_affinity",
"value": "high",
"lift": 3.75,
"insight": "3.75x more likely to have high tech affinity"
}
],
"expressions": {
"precision": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d AND b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a",
"balanced": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d AND b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a",
"reach": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d OR b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a"
}
}
],
"icp_analysis": {
"summary": "Your customers are distinctively tech-forward professionals.",
"top_distinctive_traits": [],
"exclusion_signals": []
}
},
"tool_trace_id": "a1b2c3d4e5f6",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Error Handling
Common Errors:
- CSV not found: "CSV file not found or inaccessible: <csv_key>"
- Empty CSV: "CSV file is empty or contains no data rows"
- No identifier columns: "No identifier columns detected. Please provide email_column, phone_column, or address_column hints."
For files larger than 40,000 rows, paginate by calling the tool repeatedly. When more rows remain, the response includes a next_offset field — pass it back as offset on the next call. The field is omitted when the page consumed the rest of the file (last page).
Resolution Warnings:
When resolution rate is below 20%, the response includes a warning:
{
"resolution": {
"warning": {
"code": "LOW_RESOLUTION_RATE",
"message": "Only 15.2% of identifiers were matched. Results may not be representative.",
"suggestions": [
"Verify column detection was correct",
"Check data quality (invalid emails, formatting issues)",
"B2B emails may have lower match rates than consumer data"
]
}
}
}Usage Examples
Example 1: Basic analysis
{
"csv_key": "uploads/abc123-def456/customer-data.csv",
"market_context": {
"base_context": "B2B SaaS platform for marketing automation",
"refinements": []
},
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 2: With iteration refinements
{
"csv_key": "uploads/abc123-def456/customer-data.csv",
"market_context": {
"base_context": "B2B SaaS platform for marketing automation",
"refinements": [
"Focus on enterprise customers with 100+ employees",
"Exclude small businesses and startups"
]
},
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 3: With column hints
{
"csv_key": "uploads/abc123-def456/customer-data.csv",
"market_context": {
"base_context": "E-commerce retailer",
"refinements": []
},
"email_columns": "contact_email",
"phone_columns": "mobile_number",
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Example 4: With multi-column support
{
"csv_key": "uploads/abc123-def456/customer-data.csv",
"market_context": {
"base_context": "E-commerce retailer",
"refinements": []
},
"email_columns": ["contact_email", "secondary_email", "billing_email"],
"phone_columns": ["mobile_number", "business_phone"],
"workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}Next Steps
After analyzing customers, use the personas from icp_synthesis to generate lookalike audiences with generate_audience.