Auto-Scaling

Automatically scale agents based on traffic and performance metrics.

Overview

Auto-scaling automatically adjusts the number of agent instances based on demand. Scale up during traffic spikes and down during quiet periods to optimize cost and performance.

Configuration

# Configure auto-scaling for an agent
$ hyperfold agent scale sales-negotiator \
  --min=2 \
  --max=50 \
  --target-concurrency=100
Scaling configuration updated:
  Agent:              sales-negotiator
  Min Instances:      2
  Max Instances:      50
  Target Concurrency: 100 requests/instance
# View current scaling status
$ hyperfold agent scale sales-negotiator --status
SCALING STATUS: sales-negotiator
CONFIGURATION
  Min Instances:      2
  Max Instances:      50
  Target Concurrency: 100
CURRENT STATE
  Active Instances:   8
  Current Load:       720 req/min
  Avg Concurrency:    90 req/instance
  CPU Utilization:    65%
  Memory Usage:       412MB/512MB

Advanced Configuration

# Advanced scaling configuration
$ hyperfold agent scale sales-negotiator \
  --min=2 \
  --max=100 \
  --target-concurrency=80 \
  --scale-up-threshold=0.7 \
  --scale-down-threshold=0.3 \
  --cooldown=60
# Scale based on multiple metrics
$ hyperfold agent scale sales-negotiator \
  --metrics='[
    {"metric": "concurrency", "target": 80},
    {"metric": "cpu", "target": 70},
    {"metric": "latency_p95", "max": 500}
  ]'
# Scheduled scaling for predictable traffic
$ hyperfold agent scale sales-negotiator \
  --schedule='[
    {"cron": "0 9 * * 1-5", "min": 10, "max": 100},
    {"cron": "0 18 * * 1-5", "min": 5, "max": 50},
    {"cron": "0 0 * * 6-7", "min": 2, "max": 20}
  ]'
Scheduled scaling configured:
  Weekdays 9 AM:    10-100 instances (business hours)
  Weekdays 6 PM:    5-50 instances (evening)
  Weekends:         2-20 instances (low traffic)

Configuration Options

OptionDescription
--minMinimum instances always running
--maxMaximum instances allowed
--target-concurrencyTarget requests per instance
--cooldownSeconds between scaling decisions
--scheduleTime-based scaling overrides

Scaling Strategies

# TARGET CONCURRENCY (recommended) - Scales based on requests per instance.
# Best for: API-heavy workloads, negotiation agents
$ hyperfold agent scale myagent \
  --strategy=target_concurrency \
  --target-concurrency=100

# CPU UTILIZATION - Scales based on CPU usage percentage.
# Best for: Compute-intensive workloads
$ hyperfold agent scale myagent \
  --strategy=cpu \
  --target-cpu=70

# QUEUE DEPTH - Scales based on pending requests.
# Best for: Batch processing, async workflows
$ hyperfold agent scale myagent \
  --strategy=queue_depth \
  --target-queue=10

# HYBRID - Combines multiple metrics with weights.
# Best for: Complex workloads
$ hyperfold agent scale myagent \
  --strategy=hybrid \
  --metrics='[
    {"metric": "concurrency", "target": 80, "weight": 0.5},
    {"metric": "cpu", "target": 70, "weight": 0.3},
    {"metric": "memory", "target": 80, "weight": 0.2}
  ]'

Monitoring

# Monitor scaling activity
$ hyperfold agent scale-history sales-negotiator --since=24h
# View real-time scaling metrics
$ hyperfold metrics agents sales-negotiator --live

Optimization

Analyze scaling patterns to optimize configuration:

# Optimize scaling configuration
$ hyperfold agent scale-analyze sales-negotiator --since=7d
SCALING ANALYSIS: sales-negotiator (7 days)
CURRENT CONFIGURATION
  Min: 2, Max: 50, Target Concurrency: 100
OBSERVATIONS
 Over-provisioned during off-hours
 Scale-up lag during morning spike
 Target concurrency well-tuned
 Frequent scale oscillation 2-4 PM
RECOMMENDED CONFIGURATION
  --min=1
  --max=50
  --target-concurrency=100
  --cooldown=90
  --schedule='[
    {"cron": "0 8 * * 1-5", "min": 4},
    {"cron": "0 20 * * 1-5", "min": 2},
    {"cron": "0 2 * * *", "min": 1}
  ]'
ESTIMATED SAVINGS
  Current monthly cost:    $2,450
  Optimized monthly cost:  $1,890
  Savings:                 $560 (23%)
Apply recommendations? [Y/n]