Back

Dec 14, 2025

A/B Testing Agent Prompts

H
Hyperfold Team
AgentsTesting

Experiment Design

A/B testing agent prompts helps you optimize for conversion, satisfaction, and revenue. Before running experiments, define clear hypotheses and metrics.

Key considerations:

  • Test one variable at a time for clear causation
  • Define primary metric before starting
  • Run for statistical significance (typically 1000+ sessions)
  • Consider guardrail metrics to catch negative effects

Creating Variants

Define your experiment variants in configuration:

yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# experiment-config.yaml
experiment:
name: negotiation-tone-test
description: Test friendly vs professional negotiation tone
variants:
- id: control
weight: 50
prompt: |
You are a professional sales agent. Be courteous and business-like.
Focus on value and product benefits.
- id: treatment
weight: 50
prompt: |
You are a friendly shopping assistant. Be warm and conversational.
Build rapport while helping customers find what they need.
metrics:
primary: conversion_rate
secondary:
- average_order_value
- customer_satisfaction
- session_duration
duration: 14d
min_sessions: 1000

Traffic Splitting

Implement traffic splitting in your agent:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Traffic splitting implementation
import { getExperimentVariant } from '@hyperfold/experiments';
@OnACPEvent('session.start')
async handleSessionStart(session: Session) {
// Assign variant based on experiment config
const variant = await getExperimentVariant(
'negotiation-tone-test',
session.id
);
// Store variant assignment for consistent experience
await this.state.set(`session:${session.id}:variant`, variant.id);
// Use variant's system prompt
this.systemPrompt = variant.prompt;
// Track assignment
await trackEvent('experiment.assigned', {
experiment: 'negotiation-tone-test',
variant: variant.id,
session_id: session.id,
});
}
Ensure consistent variant assignment per session. A customer should see the same variant throughout their entire session.

Measuring Results

Track experiment metrics:

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
# View experiment metrics
$ hyperfold experiments metrics negotiation-tone-test
EXPERIMENT: negotiation-tone-test
Status: Running (Day 7 of 14)
VARIANT SESSIONS CONVERSIONS CONV RATE AOV SIGNIFICANCE
control 2,847 912 32.0% $156.20 -
treatment 2,891 1,012 35.0% $148.40 94.2%
Primary Metric: conversion_rate
Treatment shows +3.0% lift (94.2% confidence)
Need 95% confidence to declare winner

Statistical Analysis

Hyperfold uses Bayesian analysis to determine statistical significance:

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Get detailed analysis
$ hyperfold experiments analyze negotiation-tone-test --detailed
STATISTICAL ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Conversion Rate:
Control: 32.0% (95% CI: 30.3% - 33.7%)
Treatment: 35.0% (95% CI: 33.3% - 36.7%)
Lift: +9.4% relative improvement
P-value: 0.058
Recommendation: Continue running - approaching significance
# When ready to conclude
$ hyperfold experiments conclude negotiation-tone-test --winner=treatment