Guardrails
Guardrails are safety limits that prevent runaway agents from consuming excessive resources. They complement budgets by enforcing token counts, call counts, runtime duration, and detecting infinite loops. When a guardrail fires, the agent is stopped immediately with a specific exception.
ℹ️ Fail-open design
Token Limits #
Limit the total number of tokens (input + output) an agent can consume in a single run. The check happens after each LLM call (post-flight), meaning the call that crosses the threshold succeeds but the next call is blocked.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
max_tokens_per_run | int | No | None | Maximum total tokens (input + output) allowed per run. None disables the limit. |
⚠️ Post-flight check
from agentkavach import AgentKavach
from agentkavach.exceptions import TokenLimitError
guard = AgentKavach(
agent_name="summarizer",
api_key="cg_...",
max_tokens_per_run=10_000, # stop after ~10k tokens
)
try:
for chunk in work_items:
response = guard.create(
model="gpt-4o",
messages=[{"role": "user", "content": chunk}],
)
print(response.choices[0].message.content)
except TokenLimitError as e:
print(f"Token limit reached: {e}")
# gracefully wrap up the agent runCall Count Limits #
Limit the number of LLM calls an agent can make in a single run. The check happens before each call (pre-flight), so you never pay for the blocked call.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
max_calls_per_run | int | No | None | Maximum number of LLM calls allowed per run. None disables the limit. |
ℹ️ Pre-flight check
from agentkavach import AgentKavach
from agentkavach.exceptions import CallLimitError
guard = AgentKavach(
agent_name="researcher",
api_key="cg_...",
max_calls_per_run=50, # max 50 LLM calls per run
)
try:
while has_more_questions():
response = guard.create(
model="gpt-4o",
messages=[{"role": "user", "content": next_question()}],
)
process(response)
except CallLimitError as e:
print(f"Call limit reached after {e.call_count} calls")
# save partial results and exit gracefullyRuntime Limits #
Limit how long an agent can run. The clock starts on the first LLM call and is checked before each subsequent call. If the elapsed time exceeds the limit, a RuntimeLimitError is raised.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
max_runtime_seconds | float | No | None | Maximum wall-clock seconds from the first LLM call. None disables the limit. |
from agentkavach import AgentKavach
from agentkavach.exceptions import RuntimeLimitError
guard = AgentKavach(
agent_name="deep-researcher",
api_key="cg_...",
max_runtime_seconds=300.0, # 5-minute timeout
)
try:
while not done:
response = guard.create(
model="gpt-4o",
messages=build_messages(),
)
done = evaluate(response)
except RuntimeLimitError as e:
print(f"Runtime limit exceeded: ran for {e.elapsed:.1f}s")
# save progress for later resumptionLoop Detection #
Detects when an agent gets stuck in a repeating pattern of calls. The detector tracks (model, tool_name) pairs across the last 20 calls and looks for repeating sequences of length 2 to 5. When a sequence repeats more than loop_threshold times, a LoopDetectedError is raised.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
detect_loops | bool | No | False | Enable loop detection. When False, no loop checking is performed. |
loop_threshold | int | No | 3 | Number of times a sequence must repeat before triggering. Minimum value is 2. |
🚨 Infinite loops can be expensive
from agentkavach import AgentKavach
from agentkavach.exceptions import LoopDetectedError
guard = AgentKavach(
agent_name="tool-agent",
api_key="cg_...",
detect_loops=True,
loop_threshold=3, # fire after 3 repetitions of same pattern
)
try:
while not done:
response = guard.create(
model="gpt-4o",
messages=messages,
tools=tool_definitions,
)
# process tool calls...
except LoopDetectedError as e:
print(f"Loop detected: {e.pattern} repeated {e.count} times")
# break the loop — maybe inject a different promptException Hierarchy #
All guardrail exceptions inherit from GuardrailError, which is separate from BudgetExceededError. This lets you catch budget and guardrail violations independently.
Exception
├── BudgetExceededError # budget limits (daily/monthly/total)
└── GuardrailError # all guardrail violations
├── TokenLimitError # max_tokens_per_run exceeded
├── CallLimitError # max_calls_per_run exceeded
├── RuntimeLimitError # max_runtime_seconds exceeded
└── LoopDetectedError # repeating call pattern detectedfrom agentkavach.exceptions import (
BudgetExceededError,
GuardrailError,
TokenLimitError,
CallLimitError,
RuntimeLimitError,
LoopDetectedError,
)
try:
response = guard.create(model="gpt-4o", messages=msgs)
except BudgetExceededError:
print("Budget limit hit — agent stopped")
except GuardrailError as e:
print(f"Guardrail triggered: {type(e).__name__}: {e}")You can also catch specific guardrail types for fine-grained control:
try:
response = guard.create(model="gpt-4o", messages=msgs)
except TokenLimitError:
save_partial_results()
except CallLimitError:
log_and_summarize()
except RuntimeLimitError:
checkpoint_state()
except LoopDetectedError:
inject_new_prompt()
except BudgetExceededError:
alert_team()Combining Guardrails #
All guardrails can be used together alongside budgets and alert channels. Each guardrail is evaluated independently — the first one to trigger stops the agent.
from agentkavach import AgentKavach, Budget
from agentkavach.alerts import AlertRule, ChannelType
guard = AgentKavach(
agent_name="production-agent",
api_key="cg_...",
# Budget
budget=Budget(daily=5.00, monthly=100.00),
# Guardrails
max_tokens_per_run=50_000,
max_calls_per_run=200,
max_runtime_seconds=600.0, # 10 minutes
detect_loops=True,
loop_threshold=3,
# Alert channels
channels=[
AlertRule(
channel_type=ChannelType.SLACK,
threshold=0.70,
webhook_url="https://hooks.slack.com/services/T.../B.../xxx",
),
AlertRule(
channel_type=ChannelType.EMAIL,
threshold=0.80,
to="team@example.com",
),
AlertRule(
channel_type=ChannelType.KILL,
threshold=1.0,
),
],
on_kill=lambda agent, spent, budget: print(f"{'{'}agent{'}'} killed at ${'{'}spent:.2f{'}'}"),
)
# The agent is now protected by budget limits, guardrails, and alerts.
# Any violation stops the agent with the appropriate exception.
response = guard.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this dataset..."}],
)ℹ️ Evaluation order
Fail on Error
By default, AgentKavach uses a fail-open design: if the SDK encounters an internal error (e.g., telemetry export failure, engine error), the LLM call still proceeds. Only budget and guardrail violations propagate.
Set fail_on_error=True for strict mode: any internal error will call on_kill (if configured) and raise the exception, stopping the agent immediately.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
fail_on_error | bool | No | False | When True, any internal SDK error (pre-flight, post-flight, export) will kill the agent and raise. When False (default), errors are logged and the call proceeds. |
from agentkavach import AgentKavach, Budget
def stop_agent():
print("Agent killed due to error!")
# Cleanup, send notification, etc.
guard = AgentKavach(
provider="openai",
llm_key="sk-...",
agent_name="strict-bot",
budget=Budget.daily(50),
on_kill=stop_agent,
fail_on_error=True, # Strict mode: errors kill the agent
)
# If the SDK encounters any internal error during this call,
# on_kill() will be called and the error will be raised.
response = guard.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)Save Prompts
By default, AgentKavach does not store the prompt text sent to LLM providers. Enable save_prompts=True to include prompt content in telemetry events. This is useful for debugging and auditing but increases storage usage.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
save_prompts | bool | No | False | When True, the prompt text is included in telemetry events and visible in the dashboard Event History. When False (default), prompts are not stored. |
guard = AgentKavach(
provider="openai",
llm_key="sk-...",
agent_name="audit-bot",
budget=Budget.daily(50),
save_prompts=True, # Store prompt text in events
)
# Prompts will now appear in the dashboard Event History
response = guard.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this report..."}],
)⚠️ Privacy considerations