Billing & Cost Analysis
Track spending, set budgets, and optimize your Hyperfold costs.
Overview
The billing dashboard provides complete visibility into your Hyperfold spending. Track costs by component, set budget alerts, and get AI-powered optimization recommendations.
Costs are updated hourly. For real-time spend tracking, set up budget alerts with low thresholds.
Cost Breakdown
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# View current billing summary$ hyperfold billing summary BILLING SUMMARY: January 2025━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ PERIOD: Jan 1 - Jan 20, 2025 TOTAL SPEND $4,892.50├─ LLM Inference $2,450.00 (50%)├─ Agent Compute $1,200.00 (25%)├─ Storage & Database $480.00 (10%)├─ Network & Bandwidth $320.00 (7%)├─ Integrations $242.50 (5%)└─ Support & Services $200.00 (4%) PROJECTED MONTH-END $7,645.00BUDGET $8,000.00STATUS ✓ On track # Detailed breakdown$ hyperfold billing breakdown --period=mtd COST BREAKDOWN (Month to Date)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ LLM INFERENCE $2,450.00 OpenAI GPT-4-Turbo 4.2M tokens $1,680.00 OpenAI GPT-4o 1.8M tokens $540.00 Embeddings 12M tokens $230.00 AGENT COMPUTE $1,200.00 sales-negotiator 142 hrs $710.00 fulfillment-agent 68 hrs $340.00 recommender-agent 30 hrs $150.00 STORAGE $480.00 Vector Database 45 GB $225.00 Document Storage 120 GB $180.00 Logs & Analytics 50 GB $75.00 INTEGRATIONS $242.50 Shopify API calls 45,000 $90.00 Stripe API calls 12,000 $72.00 ShipStation 8,000 $80.50Cost Components
| Component | What's Included |
|---|---|
| LLM Inference | GPT-4 tokens, embeddings, reasoning |
| Agent Compute | Container runtime, CPU, memory |
| Storage | Vector DB, documents, logs |
| Integrations | External API calls, webhooks |
LLM Costs
LLM inference is typically the largest cost component. Analyze token usage to optimize spending:
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Detailed LLM usage analysis$ hyperfold billing llm --since=7d LLM USAGE (7 days)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TOTAL TOKENS: 8.4M COST: $1,120.00 BY MODEL GPT-4-Turbo 5.2M tokens $832.00 (74%) GPT-4o 2.4M tokens $216.00 (19%) text-embedding-3-large 0.8M tokens $72.00 (6%) BY AGENT sales-negotiator 6.1M tokens $890.00 (79%) recommender-agent 1.8M tokens $180.00 (16%) fulfillment-agent 0.5M tokens $50.00 (4%) BY OPERATION Negotiation reasoning 4.2M tokens $672.00 Product search 1.5M tokens $135.00 Quote generation 1.2M tokens $168.00 Customer context 0.9M tokens $81.00 Other 0.6M tokens $64.00 EFFICIENCY METRICS Avg tokens/session: 847 Avg tokens/conversion: 2,541 Cost/conversion: $0.34 Sessions/dollar: 3.8 # Per-session LLM costs$ hyperfold billing llm --session=sess_abc123 SESSION: sess_abc123 Duration: 32.5s Tokens: 1,247 Cost: $0.20 Outcome: conversion ($155.00) ROI: 775xBudget Alerts
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Configure budget alerts$ hyperfold billing budget set \ --monthly=8000 \ --alert-threshold=80 Budget configured: Monthly limit: $8,000 Alert at: 80% ($6,400) Current spend: $4,892.50 (61%) # Set component-specific budgets$ hyperfold billing budget set \ --component=llm \ --monthly=3000 \ --alert-threshold=90 $ hyperfold billing budget set \ --component=compute \ --monthly=2000 \ --alert-threshold=85 # View budget status$ hyperfold billing budget status BUDGET STATUS━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPONENT BUDGET SPENT REMAINING STATUSOverall $8,000 $4,893 $3,107 ✓ 61%LLM $3,000 $2,450 $550 ⚠ 82%Compute $2,000 $1,200 $800 ✓ 60%Storage $800 $480 $320 ✓ 60% ALERTS ⚠ LLM spend at 82% of budget Projected month-end: $3,850 (128% of budget) Recommendation: Review token usage or increase budget # Budget alert notification settings$ hyperfold billing budget alerts \ --channels="slack:#finance,email:billing@company.com" \ --frequency=dailyCost Optimization
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Get cost optimization recommendations$ hyperfold billing optimize COST OPTIMIZATION RECOMMENDATIONS━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. SWITCH TO GPT-4O-MINI FOR SIMPLE TASKS Current: Using GPT-4-Turbo for all operations Analysis: 35% of requests are simple lookups Recommendation: Route simple queries to GPT-4o-mini Estimated savings: $420/month (17%) 2. ENABLE RESPONSE CACHING Current: No caching for similar queries Analysis: 12% query similarity detected Recommendation: Enable semantic caching Estimated savings: $180/month (7%) 3. OPTIMIZE AGENT SCALING Current: Min instances = 2 (24/7) Analysis: 2 AM - 6 AM traffic < 10 req/min Recommendation: Reduce min to 1 during off-hours Estimated savings: $150/month (6%) 4. REDUCE EMBEDDING DIMENSIONS Current: Using 3072-dimension embeddings Analysis: 1536 dimensions sufficient for your data Recommendation: Switch to smaller embeddings Estimated savings: $60/month (2%) TOTAL POTENTIAL SAVINGS: $810/month (33%) Apply all recommendations? [Y/n] # Compare costs across periods$ hyperfold billing compare --period1=dec --period2=jan COST COMPARISON: December vs January━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPONENT DECEMBER JANUARY CHANGELLM $2,100 $2,450 +$350 (+17%)Compute $980 $1,200 +$220 (+22%)Storage $420 $480 +$60 (+14%)Total $3,500 $4,130 +$630 (+18%) DRIVERS + Session volume up 25% + Avg tokens/session up 8% - Compute efficiency improved 5% REVENUE COMPARISON December revenue: $156,000 January revenue: $198,000 (+27%) Cost as % of revenue: December: 2.2% January: 2.1% (improved)Optimization Strategies
Model Selection
Use smaller, faster models for simple tasks. Route complex reasoning to GPT-4 only when needed.
Response Caching
Cache responses for semantically similar queries. Reduces token usage without affecting quality.
Prompt Optimization
Shorter, more focused prompts use fewer tokens. Review verbose system prompts for trimming opportunities.
Smart Scaling
Reduce minimum instances during off-peak hours. Use scheduled scaling for predictable traffic patterns.
For infrastructure scaling configuration, see Auto-Scaling.