Agent Loop & Error Recovery

The core agentic loop: receive message, call LLM, execute tools, iterate, respond.

Agent Loop Flow

Persist user message — stored with importance score
Auto-route model — classify query complexity (if not overridden)
Build system prompt — base prompt + matched skills + known facts
Retrieve context — tri-hybrid memory retrieval
Iterate (up to max_iterations):
- Collect pinned old memories + recent messages (deduplicated)
- Build OpenAI-format message list
- Call LLM with error-classified recovery
- If tool calls → execute each, persist results, continue loop
- If no tool calls OR final iteration → return text response
Max iterations reached → return timeout message

The agent has a built-in request_more_iterations tool that extends the loop budget when the current limit is insufficient:

Extends budget by 10 iterations per call
Hard cap prevents unlimited extension (typically 25 total)
Requires a reason parameter explaining what remains to be done
Used when the agent has a clear plan but would otherwise run out of iterations mid-task

The call_llm_with_recovery method classifies errors and responds accordingly:

Error Type	Strategy
Auth / Billing	Return immediately to user — no retry
Rate Limit	Wait `retry_after_secs` (capped at 60s), retry once
Timeout / Network / Server Error	Wait 2s, retry once; on failure, fall back to previous model
Not Found (bad model)	Immediately switch to fallback model
Unknown	Propagate as error

Last Known Good

After every successful LLM call, the current config is saved as config.toml.lastgood. This enables automatic recovery from bad config changes.

To satisfy constraints across Gemini, Anthropic, and OpenAI providers, aidaemon runs a three-pass fixup on the message history before each LLM call:

Pass 1: Merge consecutive same-role messages (combines tool_calls arrays)
Pass 2: Drop orphaned tool results (no matching assistant tool_call) and strip orphaned tool_calls (no matching tool result)
Pass 3: Merge again after orphan removal may create new consecutive same-role messages

During the loop, each tool call receives:

The agent loop includes safeguards against getting stuck:

Stall detection — if the same tool is called 3+ times consecutively with similar arguments, the loop breaks
Repetition detection — detects repeated response text and forces a break
Hard iteration limit — default 10, extendable to 25 via request_more_iterations

Session Type	Format	Trusted
Telegram chat	Chat ID as string	Yes
Slack channel	`slack:{channel_id}` or `slack:{channel_id}:{thread_ts}`	Yes
Discord channel	`discord:{channel_id}`	Yes
Email trigger	`email_trigger`	No
Event trigger	`event_{uuid}`	No
Sub-agent	`sub-{depth}-{uuid}`	Inherited