Incident Response

When an incident fires, the first question is always “what changed?”. OpsTrails lets AI answer that instantly by querying the operational timeline.

The Pattern

Alert fires — PagerDuty, Opsgenie, or your monitoring system triggers an incident notification
Engineer asks AI — Instead of manually checking CI/CD logs and dashboards, the engineer asks their AI assistant “what changed in the last 2 hours?”
OpsTrails returns context — The AI queries the timeline and returns recent events with metric impact analysis
Faster resolution — The engineer has the information they need in seconds, not minutes, cutting MTTR significantly

AI Conversation Example

Engineer:

“Production error rate is spiking. What changed in the last 2 hours?”

AI (via OpsTrails MCP):

I found 2 events in the last 2 hours for the production environment:

1. deployment of api-service v2.1.0 at 14:30 UTC (45 minutes ago)
2. config-change from terraform/production-infra at 14:15 UTC

Checking metrics around the api-service deployment: error rate increased from 0.1% to 2.3% in the 30 minutes after deploy. This deployment is the likely cause.

Key Questions AI Can Answer

“What changed in the last hour?”
“Were there any CRITICAL severity events today?”
“Show me all events for the api-service source”
“Did error rates change after the 2pm deployment?”
“What was the last rollback?”

Best Practices

Connect analytics providers — Impact analysis is most powerful when metrics are connected. See Connecting Providers.
Use severity for incidents — Mark incident events with MAJOR or CRITICAL severity to make them easy to find
Track all change types — Don't just track deployments. Config changes, database migrations, and infrastructure updates are often the cause of incidents

Previous← Deployment Tracking NextRelease Management →