Billing & Cost Analysis

Track spending, set budgets, and optimize your Hyperfold costs.

Overview

The billing dashboard provides complete visibility into your Hyperfold spending. Track costs by component, set budget alerts, and get AI-powered optimization recommendations.

Costs are updated hourly. For real-time spend tracking, set up budget alerts with low thresholds.

Cost Breakdown

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# View current billing summary
$ hyperfold billing summary
BILLING SUMMARY: January 2025
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PERIOD: Jan 1 - Jan 20, 2025
TOTAL SPEND $4,892.50
├─ LLM Inference $2,450.00 (50%)
├─ Agent Compute $1,200.00 (25%)
├─ Storage & Database $480.00 (10%)
├─ Network & Bandwidth $320.00 (7%)
├─ Integrations $242.50 (5%)
└─ Support & Services $200.00 (4%)
PROJECTED MONTH-END $7,645.00
BUDGET $8,000.00
STATUS ✓ On track
# Detailed breakdown
$ hyperfold billing breakdown --period=mtd
COST BREAKDOWN (Month to Date)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LLM INFERENCE $2,450.00
OpenAI GPT-4-Turbo 4.2M tokens $1,680.00
OpenAI GPT-4o 1.8M tokens $540.00
Embeddings 12M tokens $230.00
AGENT COMPUTE $1,200.00
sales-negotiator 142 hrs $710.00
fulfillment-agent 68 hrs $340.00
recommender-agent 30 hrs $150.00
STORAGE $480.00
Vector Database 45 GB $225.00
Document Storage 120 GB $180.00
Logs & Analytics 50 GB $75.00
INTEGRATIONS $242.50
Shopify API calls 45,000 $90.00
Stripe API calls 12,000 $72.00
ShipStation 8,000 $80.50

Cost Components

ComponentWhat's Included
LLM InferenceGPT-4 tokens, embeddings, reasoning
Agent ComputeContainer runtime, CPU, memory
StorageVector DB, documents, logs
IntegrationsExternal API calls, webhooks

LLM Costs

LLM inference is typically the largest cost component. Analyze token usage to optimize spending:

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Detailed LLM usage analysis
$ hyperfold billing llm --since=7d
LLM USAGE (7 days)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL TOKENS: 8.4M COST: $1,120.00
BY MODEL
GPT-4-Turbo 5.2M tokens $832.00 (74%)
GPT-4o 2.4M tokens $216.00 (19%)
text-embedding-3-large 0.8M tokens $72.00 (6%)
BY AGENT
sales-negotiator 6.1M tokens $890.00 (79%)
recommender-agent 1.8M tokens $180.00 (16%)
fulfillment-agent 0.5M tokens $50.00 (4%)
BY OPERATION
Negotiation reasoning 4.2M tokens $672.00
Product search 1.5M tokens $135.00
Quote generation 1.2M tokens $168.00
Customer context 0.9M tokens $81.00
Other 0.6M tokens $64.00
EFFICIENCY METRICS
Avg tokens/session: 847
Avg tokens/conversion: 2,541
Cost/conversion: $0.34
Sessions/dollar: 3.8
# Per-session LLM costs
$ hyperfold billing llm --session=sess_abc123
SESSION: sess_abc123
Duration: 32.5s
Tokens: 1,247
Cost: $0.20
Outcome: conversion ($155.00)
ROI: 775x

Budget Alerts

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Configure budget alerts
$ hyperfold billing budget set \
--monthly=8000 \
--alert-threshold=80
Budget configured:
Monthly limit: $8,000
Alert at: 80% ($6,400)
Current spend: $4,892.50 (61%)
# Set component-specific budgets
$ hyperfold billing budget set \
--component=llm \
--monthly=3000 \
--alert-threshold=90
$ hyperfold billing budget set \
--component=compute \
--monthly=2000 \
--alert-threshold=85
# View budget status
$ hyperfold billing budget status
BUDGET STATUS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
COMPONENT BUDGET SPENT REMAINING STATUS
Overall $8,000 $4,893 $3,107 ✓ 61%
LLM $3,000 $2,450 $550 ⚠ 82%
Compute $2,000 $1,200 $800 ✓ 60%
Storage $800 $480 $320 ✓ 60%
ALERTS
⚠ LLM spend at 82% of budget
Projected month-end: $3,850 (128% of budget)
Recommendation: Review token usage or increase budget
# Budget alert notification settings
$ hyperfold billing budget alerts \
--channels="slack:#finance,email:billing@company.com" \
--frequency=daily

Cost Optimization

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Get cost optimization recommendations
$ hyperfold billing optimize
COST OPTIMIZATION RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. SWITCH TO GPT-4O-MINI FOR SIMPLE TASKS
Current: Using GPT-4-Turbo for all operations
Analysis: 35% of requests are simple lookups
Recommendation: Route simple queries to GPT-4o-mini
Estimated savings: $420/month (17%)
2. ENABLE RESPONSE CACHING
Current: No caching for similar queries
Analysis: 12% query similarity detected
Recommendation: Enable semantic caching
Estimated savings: $180/month (7%)
3. OPTIMIZE AGENT SCALING
Current: Min instances = 2 (24/7)
Analysis: 2 AM - 6 AM traffic < 10 req/min
Recommendation: Reduce min to 1 during off-hours
Estimated savings: $150/month (6%)
4. REDUCE EMBEDDING DIMENSIONS
Current: Using 3072-dimension embeddings
Analysis: 1536 dimensions sufficient for your data
Recommendation: Switch to smaller embeddings
Estimated savings: $60/month (2%)
TOTAL POTENTIAL SAVINGS: $810/month (33%)
Apply all recommendations? [Y/n]
# Compare costs across periods
$ hyperfold billing compare --period1=dec --period2=jan
COST COMPARISON: December vs January
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
COMPONENT DECEMBER JANUARY CHANGE
LLM $2,100 $2,450 +$350 (+17%)
Compute $980 $1,200 +$220 (+22%)
Storage $420 $480 +$60 (+14%)
Total $3,500 $4,130 +$630 (+18%)
DRIVERS
+ Session volume up 25%
+ Avg tokens/session up 8%
- Compute efficiency improved 5%
REVENUE COMPARISON
December revenue: $156,000
January revenue: $198,000 (+27%)
Cost as % of revenue:
December: 2.2%
January: 2.1% (improved)

Optimization Strategies

Model Selection

Use smaller, faster models for simple tasks. Route complex reasoning to GPT-4 only when needed.

Response Caching

Cache responses for semantically similar queries. Reduces token usage without affecting quality.

Prompt Optimization

Shorter, more focused prompts use fewer tokens. Review verbose system prompts for trimming opportunities.

Smart Scaling

Reduce minimum instances during off-peak hours. Use scheduled scaling for predictable traffic patterns.

For infrastructure scaling configuration, see Auto-Scaling.