ENยทESยทDEยทPTยทFR
โŒ˜K

Health Monitoring

Define health probes for your services and get alerted when something breaks.

Probe Types

TypeTarget FormatWhat It Checks
httpURLHTTP status code, response body, latency
porthost:portTCP connectivity
commandShell commandExit code matches expected (default: 0)
fileFile pathFile exists and is not older than max_age_secs

Configuration

config.toml
[health]
enabled = true
tick_interval_secs = 30
result_retention_days = 7

[[health.probes]]
name = "API Server"
probe_type = "http"
target = "https://api.example.com/health"
schedule = "every 5m"
consecutive_failures_alert = 3
latency_threshold_ms = 2000
alert_session_ids = ["123456789"]

[[health.probes]]
name = "Database"
probe_type = "port"
target = "localhost:5432"
schedule = "every 1m"

HTTP Probe Options

KeyTypeDefaultDescription
timeout_secsinteger10Request timeout in seconds
expected_statusinteger200Expected HTTP status code
expected_bodystringnullExpected substring in response body
methodstring"GET"HTTP method
headersobject{}Custom HTTP headers

Alerting

When a probe fails consecutive_failures_alert times in a row, an alert is sent to all session IDs in alert_session_ids.

Background Tasks

  • Tick loop — runs every tick_interval_secs (default 30), executes due probes
  • Cleanup — runs at 3:40 AM UTC, removes old results
Dynamic Probes
Probes can also be created at runtime by the agent via the health_probe tool.