Dec 14, 2025

A/B Testing Agent Prompts

Hyperfold Team

AgentsTesting

Experiment Design

A/B testing agent prompts helps you optimize for conversion, satisfaction, and revenue. Before running experiments, define clear hypotheses and metrics.

Key considerations:

Test one variable at a time for clear causation
Define primary metric before starting
Run for statistical significance (typically 1000+ sessions)
Consider guardrail metrics to catch negative effects

Creating Variants

Define your experiment variants in configuration:

yaml

# experiment-config.yaml
experiment:
  name: negotiation-tone-test
  description: Test friendly vs professional negotiation tone
 
  variants:
    - id: control
      weight: 50
      prompt: |
        You are a professional sales agent. Be courteous and business-like.
        Focus on value and product benefits.
 
    - id: treatment
      weight: 50
      prompt: |
        You are a friendly shopping assistant. Be warm and conversational.
        Build rapport while helping customers find what they need.
 
  metrics:
    primary: conversion_rate
    secondary:
      - average_order_value
      - customer_satisfaction
      - session_duration
 
  duration: 14d
  min_sessions: 1000

Traffic Splitting

Implement traffic splitting in your agent:

typescript

// Traffic splitting implementation
import { getExperimentVariant } from '@hyperfold/experiments';
 
@OnACPEvent('session.start')
async handleSessionStart(session: Session) {
  // Assign variant based on experiment config
  const variant = await getExperimentVariant(
    'negotiation-tone-test',
    session.id
  );
 
  // Store variant assignment for consistent experience
  await this.state.set(`session:${session.id}:variant`, variant.id);
 
  // Use variant's system prompt
  this.systemPrompt = variant.prompt;
 
  // Track assignment
  await trackEvent('experiment.assigned', {
    experiment: 'negotiation-tone-test',
    variant: variant.id,
    session_id: session.id,
  });
}

Ensure consistent variant assignment per session. A customer should see the same variant throughout their entire session.

Measuring Results

Track experiment metrics:

bash

# View experiment metrics
$ hyperfold experiments metrics negotiation-tone-test
 
EXPERIMENT: negotiation-tone-test
Status: Running (Day 7 of 14)
 
VARIANT         SESSIONS  CONVERSIONS  CONV RATE  AOV      SIGNIFICANCE
control         2,847     912          32.0%      $156.20  -
treatment       2,891     1,012        35.0%      $148.40  94.2%
 
Primary Metric: conversion_rate
  Treatment shows +3.0% lift (94.2% confidence)
  Need 95% confidence to declare winner

Statistical Analysis

Hyperfold uses Bayesian analysis to determine statistical significance:

bash

# Get detailed analysis
$ hyperfold experiments analyze negotiation-tone-test --detailed
 
STATISTICAL ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Conversion Rate:
  Control:     32.0% (95% CI: 30.3% - 33.7%)
  Treatment:   35.0% (95% CI: 33.3% - 36.7%)
 
  Lift:        +9.4% relative improvement
  P-value:     0.058
 
Recommendation: Continue running - approaching significance
 
# When ready to conclude
$ hyperfold experiments conclude negotiation-tone-test --winner=treatment