Limits & budgets

AgentKavach enforces two kinds of limit: budgets, which cap spend, tokens, or runtime, and guardrails, which stop an agent on its behavior (too many calls, or a stuck loop). Each is tracked in memory and enforced independently; the first one an agent reaches terminates it. The budget dimensions are:

Cost. A spending limit in US dollars, available over three periods: daily (resets at midnight UTC), monthly (resets on the first of the month), and total (never resets). Cost is also poolable across every agent in an organization with Budget.org_budget(...).
Token count. A limit on the total tokens an agent may consume, set with max_tokens_per_run.
Duration. A limit on how long an agent may run, set with max_runtime_seconds.

Plus two behavioral guardrails: a call count cap and loop detection. Each section below covers one limit.

Dimensions #

Threshold alerts evaluate each dimension independently.

Dimension	Unit	Configured via
`cost`	USD	`budget=Budget.daily / .monthly / .total`
`tokens_total`	tokens	`max_tokens_per_run=...`
`duration`	milliseconds	`max_runtime_seconds=...`

Per-dimension alerts

Bind a channel to a dimension by setting budget_type on the ChannelConfig. The channel fires only when usage on that dimension crosses the threshold.

Budget.daily(limit) #

python

from agentkavach import AgentKavach, Budget

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),       # $50/day
)

response = guard.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(f"Spent: ${guard.spent:.4f}")
print(f"Remaining: ${guard.remaining:.4f}")
print(f"Utilization: {guard.engine.utilization:.1%}")

A daily budget resets at midnight UTC. Use it to cap per-day spend.

Parameter	Type	Required	Default	Description
`limit`	`float`	Yes	—	USD limit per day. Must be > 0.

Budget.monthly(limit) #

python

guard = AgentKavach(
    provider="anthropic",
    api_key="ak_prod_...",
    llm_key="sk-ant-...",
    budget=Budget.monthly(500),     # $500/month
)

A monthly budget resets on the 1st at midnight UTC. Match it to your cloud billing cycle.

Parameter	Type	Required	Default	Description
`limit`	`float`	Yes	—	USD limit per month. Must be > 0.

Budget.total(limit) #

python

guard = AgentKavach(
    provider="google",
    api_key="ak_prod_...",
    llm_key="AIza...",
    budget=Budget.total(1000),      # $1,000 lifetime cap
)

A total budget never resets. Once the limit is reached, the agent stays blocked until you raise the limit.

Parameter	Type	Required	Default	Description
`limit`	`float`	Yes	—	USD lifetime limit. Must be > 0. Never resets.

Budget.org_budget(limit, period) #

python

from agentkavach import AgentKavach, Budget

# Per-agent cap: $10/day
# Org-wide cap: $50/day across every agent
org_pool = Budget.org_budget(limit=50, period="daily")

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    agent_name="research-bot",
    budget=Budget.daily(10),        # per-agent
    org_budget=org_pool,            # shared
)

response = guard.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(f"Agent spent: ${guard.spent:.4f}")
print(f"Agent remaining: ${guard.remaining:.4f}")

An org budget applies across every agent in your org. Each call counts toward the shared pool. When the org limit is reached, every agent stops.

Org budgets coexist with per-agent budgets. The agent has its own limit and shares the pool. The most restrictive limit wins.

Parameter	Type	Required	Default	Description
`limit`	`float`	Yes	—	USD limit for the entire organization. Must be > 0.
`period`	`str \| Period`	No	"daily"	Budget period: "daily", "monthly", or "total". Defaults to "daily".

Most restrictive wins

When both budgets are set, the agent stops as soon as either limit is reached. An agent with a $10/day budget stops at $10 even if the org pool has room. If the org pool hits $50, every agent stops even if their individual budgets have room.

Individual vs. org budgets #

Two levels of enforcement work independently or together.

Feature	Individual budget	Org budget
Scope	Single agent	All agents in the org
Set via SDK	`budget=Budget.daily(10)`	`org_budget=Budget.org_budget(50)`
Set via YAML	`agents.<name>.budget.daily`	`org_budget.limit / org_budget.period`
Set via SDK / YAML	`Budget.daily(...)` / `agents.<name>.budget`	`Budget.org_budget(...)` / `org_budget:`
Set via API	`POST /v1/agents/<name>/budgets`	SDK param only — synced via /v1/sync-config
Enforcement	Blocks this agent	Blocks every agent in the org
Ingest rejection	`429 budget_exceeded`	`429 org_budget_exceeded`

Example: Both budgets together

python

from agentkavach import AgentKavach, Budget

# Org-wide cap: $50/day across all agents
org = Budget.org_budget(limit=50, period="daily")

research = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    agent_name="research-bot",
    budget=Budget.daily(20),    # individual
    org_budget=org,             # shared
)

support = AgentKavach(
    provider="anthropic",
    api_key="ak_prod_...",
    llm_key="sk-ant-...",
    agent_name="support-bot",
    budget=Budget.daily(15),    # individual
    org_budget=org,             # shared
)

# research-bot stops at $20
# support-bot stops at $15
# Both stop if combined org spend hits $50

Token cap (per run) #

A cost budget caps dollars over a period; a token cap limits the total tokens a single run may consume. Set max_tokens_per_run on the client and AgentKavach sums input and output tokens across every call in the run. When the running total would exceed the cap, the SDK raises TokenLimitError before the next call goes out. It is the safety net for a prompt that balloons or a loop that keeps appending context.

python

from agentkavach import AgentKavach, Budget
from agentkavach.exceptions import TokenLimitError

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),
    max_tokens_per_run=100_000,     # stop the run at 100k total tokens
)

try:
    response = guard.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this very long document..."}],
    )
except TokenLimitError as e:
    print(f"Token cap hit: {e.spent}/{e.limit} tokens")

Parameter	Type	Required	Default	Description
`max_tokens_per_run`	`int`	No	None	Total input + output tokens allowed in a single run. None disables the cap.

What a “run” means

A run is the lifetime of one AgentKavach instance — every call it makes from the moment you construct it until your process exits. It is not a clock-based window: max_tokens_per_run, max_runtime_seconds, and max_calls_per_run count from zero when the instance is created, not at midnight or the first of the month. Construct a fresh guard (one per request, or after a restart) and the counters reset to zero. Per-run caps stop a single runaway execution; a cost budget caps spend over time. They are complementary, not interchangeable.

Duration cap (per run) #

A duration cap limits how long a run may keep making calls. Set max_runtime_seconds on the client; AgentKavach measures wall-clock time from the first call, and once the elapsed time crosses the cap it raises RuntimeLimitError instead of starting another call. Use it to stop an agent that is stuck retrying or working far longer than a task should take.

python

from agentkavach import AgentKavach, Budget
from agentkavach.exceptions import RuntimeLimitError

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),
    max_runtime_seconds=120,        # stop the run after 2 minutes of calls
)

try:
    while True:
        guard.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "next step"}],
        )
except RuntimeLimitError as e:
    print(f"Runtime cap hit: {e.elapsed:.1f}s / {e.limit:.0f}s")

Parameter	Type	Required	Default	Description
`max_runtime_seconds`	`float`	No	None	Wall-clock seconds, measured from the first call, before the run is stopped. None disables the cap.

Checked between calls

The duration cap is evaluated before each call, so it stops the next call rather than interrupting one already in flight. A single very long call still completes; the cap applies to the run as a whole.

Call count #

Set max_calls_per_run to cap how many LLM calls an agent may make. When the cap is reached, AgentKavach terminates the agent and raises CallLimitError. The count is tracked for the lifetime of the AgentKavach instance.

python

from agentkavach import AgentKavach, Budget
from agentkavach.exceptions import CallLimitError

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),
    max_calls_per_run=20,   # terminate the agent after 20 calls
)

Parameter	Type	Required	Default	Description
`max_calls_per_run`	`int`	No	None	Maximum LLM calls before the agent is terminated. None disables the cap.

Loop detection #

Set detect_loops=True to stop an agent that is spinning in circles. AgentKavach watches the recent sequence of calls — each identified by its model and the tool it calls, if any — and looks for a short cycle that repeats back to back. When the same cycle repeats loop_threshold times in a row (for example, making the identical call over and over, or bouncing A → B → A → B), the agent is treated as stuck: AgentKavach terminates it and raises LoopDetectedError. It is off by default, and loop_threshold (default 3) sets how many consecutive repeats trip it.

python

from agentkavach import AgentKavach, Budget
from agentkavach.exceptions import LoopDetectedError

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),
    detect_loops=True,
    loop_threshold=5,   # 5 back-to-back repeats of the same call cycle trips it
)

Parameter	Type	Required	Default	Description
`detect_loops`	`bool`	No	False	Enable loop detection.
`loop_threshold`	`int`	No	3	Consecutive identical call patterns before LoopDetectedError is raised.

Combining limits #

Budgets and guardrails can be set together on one agent. Each is enforced independently, and the first one reached terminates the agent.

python

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(100),   # cost
    max_tokens_per_run=500_000, # tokens
    max_runtime_seconds=600,    # duration
    max_calls_per_run=100,      # call count
    detect_loops=True,          # loops
)

First limit reached wins

With several limits configured, whichever is reached first terminates the agent; the rest stop being evaluated. See the SDK reference for the exception each one raises.

Checking budget state #

python

# Current spend in the active period
guard.spent          # e.g. 12.34

# Remaining budget
guard.remaining      # e.g. 37.66

# Utilization as a fraction (0.0 to 1.0)
guard.engine.utilization  # e.g. 0.2468

Read the current budget state at any time through these properties.

What happens when the budget runs out #

The check compares recorded spend against the limit, so the call that first crosses the limit completes and is billed. The kill fires on the way back, and the next call is the one that is rejected.

The call that takes you over the limit completes, and you pay your provider for it.
The on_kill callback fires once, if you configured one.
The next guard.create() raises BudgetExceededError before the request goes out, and so does every call after it.
Only BudgetExceededError propagates. Internal errors never block your LLM calls.

python

from agentkavach.exceptions import BudgetExceededError

try:
    response = guard.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except BudgetExceededError as e:
    print(f"Budget exhausted: {e}")
    # e.spent, e.limit, e.period available

In-memory checks

The budget check itself runs in memory, with no network round trip. OpenAI and Mistral also count tokens locally; Anthropic and Google make one fast token-count request to the provider before the call.