OpsTrailsDocs
Console

Incident Response

When an incident fires, the first question is always “what changed?”. OpsTrails lets AI answer that instantly by querying the operational timeline.

The Pattern

  1. Alert fires — PagerDuty, Opsgenie, or your monitoring system triggers an incident notification
  2. Engineer asks AI — Instead of manually checking CI/CD logs and dashboards, the engineer asks their AI assistant “what changed in the last 2 hours?”
  3. OpsTrails returns context — The AI queries the timeline and returns recent events with metric impact analysis
  4. Faster resolution — The engineer has the information they need in seconds, not minutes, cutting MTTR significantly

AI Conversation Example

Engineer:

“Production error rate is spiking. What changed in the last 2 hours?”

AI (via OpsTrails MCP):

I found 2 events in the last 2 hours for the production environment:

  • 1. deployment of api-service v2.1.0 at 14:30 UTC (45 minutes ago)
  • 2. config-change from terraform/production-infra at 14:15 UTC

Checking metrics around the api-service deployment: error rate increased from 0.1% to 2.3% in the 30 minutes after deploy. This deployment is the likely cause.

Key Questions AI Can Answer

Best Practices