Dec 14, 2025
A/B Testing Agent Prompts
H
Hyperfold TeamAgentsTesting
Experiment Design
A/B testing agent prompts helps you optimize for conversion, satisfaction, and revenue. Before running experiments, define clear hypotheses and metrics.
Key considerations:
- Test one variable at a time for clear causation
- Define primary metric before starting
- Run for statistical significance (typically 1000+ sessions)
- Consider guardrail metrics to catch negative effects
Creating Variants
Define your experiment variants in configuration:
# experiment-config.yaml
experiment:
name: negotiation-tone-test
description: Test friendly vs professional negotiation tone
variants:
- id: control
weight: 50
prompt: |
You are a professional sales agent. Be courteous and business-like.
Focus on value and product benefits.
- id: treatment
weight: 50
prompt: |
You are a friendly shopping assistant. Be warm and conversational.
Build rapport while helping customers find what they need.
metrics:
primary: conversion_rate
secondary:
- average_order_value
- customer_satisfaction
- session_duration
duration: 14d
min_sessions: 1000Traffic Splitting
Implement traffic splitting in your agent:
// Traffic splitting implementation
import { getExperimentVariant } from '@hyperfold/experiments';
@OnACPEvent('session.start')
async handleSessionStart(session: Session) {
// Assign variant based on experiment config
const variant = await getExperimentVariant(
'negotiation-tone-test',
session.id
);
// Store variant assignment for consistent experience
await this.state.set(`session:${session.id}:variant`, variant.id);
// Use variant's system prompt
this.systemPrompt = variant.prompt;
// Track assignment
await trackEvent('experiment.assigned', {
experiment: 'negotiation-tone-test',
variant: variant.id,
session_id: session.id,
});
}Ensure consistent variant assignment per session. A customer should see the same variant throughout their entire session.
Measuring Results
Track experiment metrics:
# View experiment metrics
$ hyperfold experiments metrics negotiation-tone-test
EXPERIMENT: negotiation-tone-test
Status: Running (Day 7 of 14)
VARIANT SESSIONS CONVERSIONS CONV RATE AOV SIGNIFICANCE
control 2,847 912 32.0% $156.20 -
treatment 2,891 1,012 35.0% $148.40 94.2%
Primary Metric: conversion_rate
Treatment shows +3.0% lift (94.2% confidence)
Need 95% confidence to declare winnerStatistical Analysis
Hyperfold uses Bayesian analysis to determine statistical significance:
# Get detailed analysis
$ hyperfold experiments analyze negotiation-tone-test --detailed
STATISTICAL ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Conversion Rate:
Control: 32.0% (95% CI: 30.3% - 33.7%)
Treatment: 35.0% (95% CI: 33.3% - 36.7%)
Lift: +9.4% relative improvement
P-value: 0.058
Recommendation: Continue running - approaching significance
# When ready to conclude
$ hyperfold experiments conclude negotiation-tone-test --winner=treatment