analyze_customers

Description: Analyzes customer data to generate ICP (Ideal Customer Profile) insights. Takes a CSV key from generate_url_for_upload and performs schema detection, identity resolution, intelligent domain selection, profile enrichment, cluster analysis, and ICP synthesis with personas.

The domain selection step uses an LLM to analyze the market context and select the most relevant enrichment domains (3-5 domains) in addition to demographic which is always included as a baseline. This ensures the analysis focuses on the most meaningful data for your specific market context.

Tool Identifier: analyze_customers

Input Parameters

Parameter	Type	Required	Default	Constraints	Description
csv_key	string	Yes	-	Min length: 1	S3 key from generate_url_for_upload
market_context	object	Yes	-	See below	Business context with iteration support
email_columns	string \| string[]	No	null	-	Override auto-detected email column(s)
phone_columns	string \| string[]	No	null	-	Override auto-detected phone column(s)
address_columns	string \| string[]	No	null	-	Override auto-detected address column(s)
workflow_id	string	Yes	-	Valid UUID	Workflow session identifier from generate_url_for_upload
offset	number	No	0	Integer ≥ 0	Number of CSV data rows to skip before processing. Use with `limit` to paginate large CSVs across multiple calls
limit	number	No	40000	1 ≤ limit ≤ 40000	Maximum number of CSV data rows to process in this call. To process larger files, call repeatedly with increasing `offset` values

market_context Object:

Field	Type	Required	Description
base_context	string	Yes	Initial market/product description
refinements	array	No	Accumulated iteration prompts (default: [])

Request Schema:

interface AnalyzeCustomersParams {
  csv_key: string;
  market_context: {
    base_context: string;
    refinements?: string[];
  };
  email_columns?: string | string[] | null;
  phone_columns?: string | string[] | null;
  address_columns?: string | string[] | null;
  workflow_id: string;
  offset?: number;
  limit?: number;
}

Output Format

Success Response:

{
  detected_schema: {
    email_columns: string[];
    phone_columns: string[];
    address_columns: string[];
    confidence: {
      email: number;    // 0-1
      phone: number;    // 0-1
      address: number;  // 0-1
    };
    all_columns: string[];
    sample_data: Record<string, string>[];  // First 3 rows
  };
  resolution: {
    total_identifiers: number;
    resolved: number;
    resolution_rate: number;  // 0-1
    warning?: {
      code: string;
      message: string;
      suggestions: string[];
    };
  };
  selected_domains: {
    domains: string[];   // Always includes 'demographic' plus 3-5 LLM-selected domains
    reasoning: string;   // LLM explanation of why these domains were selected
  };
  cluster_lookup_warning?: {
    code: string;
    message: string;
    details?: any;
  };
  clusters: Array<{
    cluster_hash: string;
    name: string;
    value: string;
    domain: string;
    audience_prevalence: number;
    world_prevalence: number;
    lift: number;
    under_represented: boolean;
    weighted_score: number;
  }>;  // Top 15 clusters by weighted score
  icp_synthesis: {
    personas: Array<{
      name: string;
      description: string;
      key_traits: Array<{
        cluster_hash: string;
        domain: string;
        name: string;
        value: string;
        lift: number;
        insight: string;
      }>;
      expressions: {
        precision: string;  // AND-heavy expression
        balanced: string;   // Mixed AND/OR
        reach: string;      // OR-heavy expression
      };
    }>;
    icp_analysis: {
      summary: string;
      top_distinctive_traits: Array<{
        cluster_hash: string;
        domain: string;
        name: string;
        value: string;
        audience_percent: number;
        world_percent: number;
        lift: number;
      }>;
      exclusion_signals: Array<{
        cluster_hash: string;
        domain: string;
        name: string;
        value: string;
        inverse_lift: number;
      }>;
    };
  };
  tool_trace_id: string;
  workflow_id: string;
}

Example Response:

{
  "detected_schema": {
    "email_columns": ["email"],
    "phone_columns": ["phone"],
    "address_columns": [],
    "confidence": { "email": 0.95, "phone": 0.85, "address": 0 },
    "all_columns": ["email", "phone", "name", "company"],
    "sample_data": [
      { "email": "alice@example.com", "phone": "555-0001", "name": "Alice", "company": "Acme" }
    ]
  },
  "resolution": {
    "total_identifiers": 500,
    "resolved": 425,
    "resolution_rate": 0.85
  },
  "selected_domains": {
    "domains": ["intent", "affinity", "financial", "employment", "demographic"],
    "reasoning": "For B2B SaaS marketing automation, intent signals indicate purchase readiness, affinity reveals brand preferences, financial and employment data help qualify company fit, and demographic data provides decision-maker context."
  },
  "clusters": [
    {
      "cluster_hash": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d",
      "name": "tech_affinity",
      "value": "high",
      "domain": "affinity",
      "audience_prevalence": 0.45,
      "world_prevalence": 0.12,
      "lift": 3.75,
      "under_represented": false,
      "weighted_score": 2.51
    }
  ],
  "icp_synthesis": {
    "personas": [
      {
        "name": "Tech-Forward Professional",
        "description": "High-income professionals with strong technology affinity.",
        "key_traits": [
          {
            "cluster_hash": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d",
            "domain": "affinity",
            "name": "tech_affinity",
            "value": "high",
            "lift": 3.75,
            "insight": "3.75x more likely to have high tech affinity"
          }
        ],
        "expressions": {
          "precision": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d AND b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a",
          "balanced": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d AND b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a",
          "reach": "a8f3bc1d2e3f4a5b6c7d8e9f0a1b2c3d OR b2e4df5a6b7c8d9e0f1a2b3c4d5e6f7a"
        }
      }
    ],
    "icp_analysis": {
      "summary": "Your customers are distinctively tech-forward professionals.",
      "top_distinctive_traits": [],
      "exclusion_signals": []
    }
  },
  "tool_trace_id": "a1b2c3d4e5f6",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Error Handling

Common Errors:

CSV not found: "CSV file not found or inaccessible: <csv_key>"
Empty CSV: "CSV file is empty or contains no data rows"
No identifier columns: "No identifier columns detected. Please provide email_column, phone_column, or address_column hints."

For files larger than 40,000 rows, paginate by calling the tool repeatedly. When more rows remain, the response includes a next_offset field — pass it back as offset on the next call. The field is omitted when the page consumed the rest of the file (last page).

Resolution Warnings:

When resolution rate is below 20%, the response includes a warning:

{
  "resolution": {
    "warning": {
      "code": "LOW_RESOLUTION_RATE",
      "message": "Only 15.2% of identifiers were matched. Results may not be representative.",
      "suggestions": [
        "Verify column detection was correct",
        "Check data quality (invalid emails, formatting issues)",
        "B2B emails may have lower match rates than consumer data"
      ]
    }
  }
}

Usage Examples

Example 1: Basic analysis

{
  "csv_key": "uploads/abc123-def456/customer-data.csv",
  "market_context": {
    "base_context": "B2B SaaS platform for marketing automation",
    "refinements": []
  },
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 2: With iteration refinements

{
  "csv_key": "uploads/abc123-def456/customer-data.csv",
  "market_context": {
    "base_context": "B2B SaaS platform for marketing automation",
    "refinements": [
      "Focus on enterprise customers with 100+ employees",
      "Exclude small businesses and startups"
    ]
  },
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 3: With column hints

{
  "csv_key": "uploads/abc123-def456/customer-data.csv",
  "market_context": {
    "base_context": "E-commerce retailer",
    "refinements": []
  },
  "email_columns": "contact_email",
  "phone_columns": "mobile_number",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example 4: With multi-column support

{
  "csv_key": "uploads/abc123-def456/customer-data.csv",
  "market_context": {
    "base_context": "E-commerce retailer",
    "refinements": []
  },
  "email_columns": ["contact_email", "secondary_email", "billing_email"],
  "phone_columns": ["mobile_number", "business_phone"],
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Next Steps

After analyzing customers, use the personas from icp_synthesis to generate lookalike audiences with generate_audience.