Auto-Scaling

Automatically scale agents based on traffic and performance metrics.

Overview

Auto-scaling automatically adjusts the number of agent instances based on demand. Scale up during traffic spikes and down during quiet periods to optimize cost and performance.

Hyperfold uses predictive scaling combined with reactive metrics to minimize latency spikes during traffic increases.

Configuration

bash

# Configure auto-scaling for an agent
$ hyperfold agent scale sales-negotiator \
  --min=2 \
  --max=50 \
  --target-concurrency=100
 
Scaling configuration updated:
  Agent:              sales-negotiator
  Min Instances:      2
  Max Instances:      50
  Target Concurrency: 100 requests/instance
 
# View current scaling status
$ hyperfold agent scale sales-negotiator --status
 
SCALING STATUS: sales-negotiator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
CONFIGURATION
  Min Instances:      2
  Max Instances:      50
  Target Concurrency: 100
 
CURRENT STATE
  Active Instances:   8
  Current Load:       720 req/min
  Avg Concurrency:    90 req/instance
  CPU Utilization:    65%
  Memory Usage:       412MB/512MB
 
SCALING ACTIVITY (last hour)
  10:15    Scale up    6 → 8 instances    (traffic spike)
  09:45    Scale up    4 → 6 instances    (approaching target)
  09:00    Stable      4 instances
  08:30    Scale down  6 → 4 instances    (traffic decrease)

Advanced Configuration

bash

# Advanced scaling configuration
$ hyperfold agent scale sales-negotiator \
  --min=2 \
  --max=100 \
  --target-concurrency=80 \
  --scale-up-threshold=0.7 \
  --scale-down-threshold=0.3 \
  --cooldown=60
 
# Scale based on multiple metrics
$ hyperfold agent scale sales-negotiator \
  --metrics='[
    {"metric": "concurrency", "target": 80},
    {"metric": "cpu", "target": 70},
    {"metric": "latency_p95", "max": 500}
  ]'
 
# Scheduled scaling for predictable traffic
$ hyperfold agent scale sales-negotiator \
  --schedule='[
    {"cron": "0 9 * * 1-5", "min": 10, "max": 100},
    {"cron": "0 18 * * 1-5", "min": 5, "max": 50},
    {"cron": "0 0 * * 6-7", "min": 2, "max": 20}
  ]'
 
Scheduled scaling configured:
  Weekdays 9 AM:    10-100 instances (business hours)
  Weekdays 6 PM:    5-50 instances (evening)
  Weekends:         2-20 instances (low traffic)
 
# Configure scaling per agent type
$ hyperfold agent scale fulfillment-agent \
  --min=1 \
  --max=20 \
  --target-concurrency=50 \
  --scale-up-rate=2 \
  --scale-down-rate=1

Configuration Options

Option	Description
`--min`	Minimum instances always running
`--max`	Maximum instances allowed
`--target-concurrency`	Target requests per instance
`--cooldown`	Seconds between scaling decisions
`--schedule`	Time-based scaling overrides

Scaling Strategies

bash

# Scaling strategies comparison
 
TARGET CONCURRENCY (recommended)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on requests per instance.
Best for: API-heavy workloads, negotiation agents
 
$ hyperfold agent scale myagent \
  --strategy=target_concurrency \
  --target-concurrency=100
 
How it works:
  desired = ceil(current_requests / target_concurrency)
  If 800 requests and target=100: need 8 instances
 
 
CPU UTILIZATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on CPU usage percentage.
Best for: Compute-intensive workloads
 
$ hyperfold agent scale myagent \
  --strategy=cpu \
  --target-cpu=70
 
How it works:
  Scale up when avg CPU > 70%
  Scale down when avg CPU < 40%
 
 
QUEUE DEPTH
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on pending requests.
Best for: Batch processing, async workflows
 
$ hyperfold agent scale myagent \
  --strategy=queue_depth \
  --target-queue=10
 
How it works:
  desired = ceil(queue_size / target_queue)
  Scale up quickly when queue grows
 
 
HYBRID
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Combines multiple metrics with weights.
Best for: Complex workloads
 
$ hyperfold agent scale myagent \
  --strategy=hybrid \
  --metrics='[
    {"metric": "concurrency", "target": 80, "weight": 0.5},
    {"metric": "cpu", "target": 70, "weight": 0.3},
    {"metric": "memory", "target": 80, "weight": 0.2}
  ]'

Monitoring

bash

# Monitor scaling activity
$ hyperfold agent scale-history sales-negotiator --since=24h
 
SCALING HISTORY: sales-negotiator (24 hours)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
TIME          ACTION      FROM    TO      REASON
Jan 20 10:15  Scale Up    6       8       Concurrency at 95%
Jan 20 09:45  Scale Up    4       6       Concurrency at 92%
Jan 20 08:30  Scale Down  6       4       Concurrency at 25%
Jan 20 02:00  Scale Down  8       6       Scheduled (night)
Jan 19 22:00  Scale Down  10      8       Traffic decrease
Jan 19 18:30  Scale Up    8       10      Traffic spike
Jan 19 14:00  Scale Up    6       8       Concurrency at 88%
 
SUMMARY
  Scale Up Events:    4
  Scale Down Events:  3
  Avg Instances:      6.2
  Peak Instances:     10
  Min Instances:      4
 
# View real-time scaling metrics
$ hyperfold metrics agents sales-negotiator --live
 
LIVE METRICS: sales-negotiator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
[▓▓▓▓▓▓▓▓░░] 8/10 instances
 
Requests:     842/min    [▓▓▓▓▓▓▓▓▓░]
Concurrency:  95/100     [▓▓▓▓▓▓▓▓▓▓]
CPU:          68%        [▓▓▓▓▓▓▓░░░]
Memory:       420MB      [▓▓▓▓▓▓▓▓░░]
Latency p95:  340ms      [▓▓▓▓░░░░░░]
 
Scaling: ⚡ Scale up likely in ~2 minutes
 
(Press q to quit, r to refresh)

Optimization

Analyze scaling patterns to optimize configuration:

bash

# Optimize scaling configuration
$ hyperfold agent scale-analyze sales-negotiator --since=7d
 
SCALING ANALYSIS: sales-negotiator (7 days)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
CURRENT CONFIGURATION
  Min: 2, Max: 50, Target Concurrency: 100
 
OBSERVATIONS
  ⚠ Over-provisioned during off-hours
    Avg utilization 2-6 AM: 15%
    Recommendation: Reduce min to 1 during night
 
  ⚠ Scale-up lag during morning spike
    Avg scale-up time: 3.2 minutes
    Peak missed capacity: 12%
    Recommendation: Increase min to 4 at 8 AM
 
  ✓ Target concurrency well-tuned
    Avg utilization: 72%
    Recommendation: Keep at 100
 
  ⚠ Frequent scale oscillation 2-4 PM
    Scale events: 8 in 2 hours
    Recommendation: Increase cooldown to 90s
 
RECOMMENDED CONFIGURATION
  --min=1
  --max=50
  --target-concurrency=100
  --cooldown=90
  --schedule='[
    {"cron": "0 8 * * 1-5", "min": 4},
    {"cron": "0 20 * * 1-5", "min": 2},
    {"cron": "0 2 * * *", "min": 1}
  ]'
 
ESTIMATED SAVINGS
  Current monthly cost:    $2,450
  Optimized monthly cost:  $1,890
  Savings:                 $560 (23%)
 
Apply recommendations? [Y/n]

Track scaling costs with Billing & Cost Analysis.