Auto-Scaling

Automatically scale agents based on traffic and performance metrics.

Overview

Auto-scaling automatically adjusts the number of agent instances based on demand. Scale up during traffic spikes and down during quiet periods to optimize cost and performance.

Hyperfold uses predictive scaling combined with reactive metrics to minimize latency spikes during traffic increases.

Configuration

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Configure auto-scaling for an agent
$ hyperfold agent scale sales-negotiator \
--min=2 \
--max=50 \
--target-concurrency=100
Scaling configuration updated:
Agent: sales-negotiator
Min Instances: 2
Max Instances: 50
Target Concurrency: 100 requests/instance
# View current scaling status
$ hyperfold agent scale sales-negotiator --status
SCALING STATUS: sales-negotiator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CONFIGURATION
Min Instances: 2
Max Instances: 50
Target Concurrency: 100
CURRENT STATE
Active Instances: 8
Current Load: 720 req/min
Avg Concurrency: 90 req/instance
CPU Utilization: 65%
Memory Usage: 412MB/512MB
SCALING ACTIVITY (last hour)
10:15 Scale up 6 → 8 instances (traffic spike)
09:45 Scale up 4 → 6 instances (approaching target)
09:00 Stable 4 instances
08:30 Scale down 6 → 4 instances (traffic decrease)

Advanced Configuration

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Advanced scaling configuration
$ hyperfold agent scale sales-negotiator \
--min=2 \
--max=100 \
--target-concurrency=80 \
--scale-up-threshold=0.7 \
--scale-down-threshold=0.3 \
--cooldown=60
# Scale based on multiple metrics
$ hyperfold agent scale sales-negotiator \
--metrics='[
{"metric": "concurrency", "target": 80},
{"metric": "cpu", "target": 70},
{"metric": "latency_p95", "max": 500}
]'
# Scheduled scaling for predictable traffic
$ hyperfold agent scale sales-negotiator \
--schedule='[
{"cron": "0 9 * * 1-5", "min": 10, "max": 100},
{"cron": "0 18 * * 1-5", "min": 5, "max": 50},
{"cron": "0 0 * * 6-7", "min": 2, "max": 20}
]'
Scheduled scaling configured:
Weekdays 9 AM: 10-100 instances (business hours)
Weekdays 6 PM: 5-50 instances (evening)
Weekends: 2-20 instances (low traffic)
# Configure scaling per agent type
$ hyperfold agent scale fulfillment-agent \
--min=1 \
--max=20 \
--target-concurrency=50 \
--scale-up-rate=2 \
--scale-down-rate=1

Configuration Options

OptionDescription
--minMinimum instances always running
--maxMaximum instances allowed
--target-concurrencyTarget requests per instance
--cooldownSeconds between scaling decisions
--scheduleTime-based scaling overrides

Scaling Strategies

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Scaling strategies comparison
TARGET CONCURRENCY (recommended)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on requests per instance.
Best for: API-heavy workloads, negotiation agents
$ hyperfold agent scale myagent \
--strategy=target_concurrency \
--target-concurrency=100
How it works:
desired = ceil(current_requests / target_concurrency)
If 800 requests and target=100: need 8 instances
CPU UTILIZATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on CPU usage percentage.
Best for: Compute-intensive workloads
$ hyperfold agent scale myagent \
--strategy=cpu \
--target-cpu=70
How it works:
Scale up when avg CPU > 70%
Scale down when avg CPU < 40%
QUEUE DEPTH
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scales based on pending requests.
Best for: Batch processing, async workflows
$ hyperfold agent scale myagent \
--strategy=queue_depth \
--target-queue=10
How it works:
desired = ceil(queue_size / target_queue)
Scale up quickly when queue grows
HYBRID
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Combines multiple metrics with weights.
Best for: Complex workloads
$ hyperfold agent scale myagent \
--strategy=hybrid \
--metrics='[
{"metric": "concurrency", "target": 80, "weight": 0.5},
{"metric": "cpu", "target": 70, "weight": 0.3},
{"metric": "memory", "target": 80, "weight": 0.2}
]'

Monitoring

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Monitor scaling activity
$ hyperfold agent scale-history sales-negotiator --since=24h
SCALING HISTORY: sales-negotiator (24 hours)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TIME ACTION FROM TO REASON
Jan 20 10:15 Scale Up 6 8 Concurrency at 95%
Jan 20 09:45 Scale Up 4 6 Concurrency at 92%
Jan 20 08:30 Scale Down 6 4 Concurrency at 25%
Jan 20 02:00 Scale Down 8 6 Scheduled (night)
Jan 19 22:00 Scale Down 10 8 Traffic decrease
Jan 19 18:30 Scale Up 8 10 Traffic spike
Jan 19 14:00 Scale Up 6 8 Concurrency at 88%
SUMMARY
Scale Up Events: 4
Scale Down Events: 3
Avg Instances: 6.2
Peak Instances: 10
Min Instances: 4
# View real-time scaling metrics
$ hyperfold metrics agents sales-negotiator --live
LIVE METRICS: sales-negotiator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[▓▓▓▓▓▓▓▓░░] 8/10 instances
Requests: 842/min [▓▓▓▓▓▓▓▓▓░]
Concurrency: 95/100 [▓▓▓▓▓▓▓▓▓▓]
CPU: 68% [▓▓▓▓▓▓▓░░░]
Memory: 420MB [▓▓▓▓▓▓▓▓░░]
Latency p95: 340ms [▓▓▓▓░░░░░░]
Scaling: ⚡ Scale up likely in ~2 minutes
(Press q to quit, r to refresh)

Optimization

Analyze scaling patterns to optimize configuration:

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Optimize scaling configuration
$ hyperfold agent scale-analyze sales-negotiator --since=7d
SCALING ANALYSIS: sales-negotiator (7 days)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CURRENT CONFIGURATION
Min: 2, Max: 50, Target Concurrency: 100
OBSERVATIONS
⚠ Over-provisioned during off-hours
Avg utilization 2-6 AM: 15%
Recommendation: Reduce min to 1 during night
⚠ Scale-up lag during morning spike
Avg scale-up time: 3.2 minutes
Peak missed capacity: 12%
Recommendation: Increase min to 4 at 8 AM
✓ Target concurrency well-tuned
Avg utilization: 72%
Recommendation: Keep at 100
⚠ Frequent scale oscillation 2-4 PM
Scale events: 8 in 2 hours
Recommendation: Increase cooldown to 90s
RECOMMENDED CONFIGURATION
--min=1
--max=50
--target-concurrency=100
--cooldown=90
--schedule='[
{"cron": "0 8 * * 1-5", "min": 4},
{"cron": "0 20 * * 1-5", "min": 2},
{"cron": "0 2 * * *", "min": 1}
]'
ESTIMATED SAVINGS
Current monthly cost: $2,450
Optimized monthly cost: $1,890
Savings: $560 (23%)
Apply recommendations? [Y/n]
Track scaling costs with Billing & Cost Analysis.