Agent Loop & Error Recovery

The core agentic loop: receive message, call LLM, execute tools, iterate, respond.

Agent Loop Flow

  1. Persist user message — stored with importance score
  2. Auto-route model — classify query complexity (if not overridden)
  3. Build system prompt — base prompt + matched skills + known facts
  4. Retrieve context — tri-hybrid memory retrieval
  5. Iterate (up to max_iterations):
    • Collect pinned old memories + recent messages (deduplicated)
    • Build OpenAI-format message list
    • Call LLM with error-classified recovery
    • If tool calls → execute each, persist results, continue loop
    • If no tool calls OR final iteration → return text response
  6. Max iterations reached → return timeout message

Error Recovery Strategy

The call_llm_with_recovery method classifies errors and responds accordingly:

Error TypeStrategy
Auth / BillingReturn immediately to user — no retry
Rate LimitWait retry_after_secs (capped at 60s), retry once
Timeout / Network / Server ErrorWait 2s, retry once; on failure, fall back to previous model
Not Found (bad model)Immediately switch to fallback model
UnknownPropagate as error
Last Known Good
After every successful LLM call, the current config is saved as config.toml.lastgood. This enables automatic recovery from bad config changes.

Tool Execution

During the loop, each tool call receives:

  • _session_id — injected automatically for session tracking
  • _untrusted_source — flag set for trigger-originated sessions

Session Types

Session TypeFormatTrusted
Telegram chatChat ID as stringYes
Email triggeremail_triggerNo
Event triggerevent_{uuid}No
Sub-agentsub-{depth}-{uuid}Inherited