Health Monitoring

Define health probes for your services and get alerted when something breaks.

Probe Types

Type	Target Format	What It Checks
`http`	URL	HTTP status code, response body, latency
`port`	host:port	TCP connectivity
`command`	Shell command	Exit code matches expected (default: 0)
`file`	File path	File exists and is not older than max_age_secs

Configuration

config.toml

[health]
enabled = true
tick_interval_secs = 30
result_retention_days = 7

[[health.probes]]
name = "API Server"
probe_type = "http"
target = "https://api.example.com/health"
schedule = "every 5m"
consecutive_failures_alert = 3
latency_threshold_ms = 2000
alert_session_ids = ["123456789"]

[[health.probes]]
name = "Database"
probe_type = "port"
target = "localhost:5432"
schedule = "every 1m"

HTTP Probe Options

Key	Type	Default	Description
`timeout_secs`	integer	`10`	Request timeout in seconds
`expected_status`	integer	`200`	Expected HTTP status code
`expected_body`	string	`null`	Expected substring in response body
`method`	string	`"GET"`	HTTP method
`headers`	object	`{}`	Custom HTTP headers

Alerting

When a probe fails consecutive_failures_alert times in a row, an alert is sent to all session IDs in alert_session_ids.

Background Tasks

Tick loop — runs every tick_interval_secs (default 30), executes due probes
Cleanup — runs at 3:40 AM UTC, removes old results

Dynamic Probes

Probes can also be created at runtime by the agent via the health_probe tool.

PreviousPlans (Legacy)NextSelf-Updater