Always-On Engineering: Cloud Agents, Routines, and the Event-Driven SDLC

Key takeaway

Want the short version? Skip down for a concise summary.

In April we described a workflow where a developer assigns a ticket, walks away for a coffee, and comes back to a pull request. That was still a human-started process. Things have quickly changed in just a couple of months. Cursor shipped cloud agents and Automations. Anthropic shipped Claude Code managed agents and Routines. Both platforms moved the trigger point from "developer presses go" to "an event in the system presses go."

That shift touches every layer of the development process: how cloud agents and automation layers work on each platform, how we added governance when agents started running without us, and what the event-driven software development lifecycle (SDLC) looks like for a team ready to operate at this level. The background engineering team is no longer theoretical.

From Assign-and-Wait to Always-On

The April 2026 workflow was already fast. A developer assigned a ticket, an agent worked through it, and a pull request appeared in under an hour. But the developer still started every run. That is the part that changed.

Cursor shipped cloud agents alongside a new Automations feature. Anthropic shipped cloud agents for Claude Code alongside Routines. Both platforms landed in the same window with the same fundamental shift: agents that wake up on their own, triggered by events in your existing tools rather than by a developer opening an IDE.

Q1 2026

Developer-Initiated

Developer assigns ticket

Agent picks up the work

Pull request opened

Slack notification sent

Developer reviews PR

Human starts every run

June 2026

Event-Driven

Event fires (Linear / Slack / GitHub / PagerDuty)

Agent triggered automatically

Pull request opened

Slack notification sent

Developer reviews PR

System event starts the run

The internal numbers from Cursor underscore how quickly this model matures in production. More than 35% of merged PRs at Cursor itself now come from autonomous agents running without a developer present. That is not a benchmark from a controlled experiment. That is the share of shipped code that started from an event, not a human.

The practical implication is straightforward. Work that used to wait in the queue until a developer had capacity now starts the moment a trigger condition is met. A Linear or GitHub issue is created, a PagerDuty alert fires, a PR merges, and an agent is already picking up the task before anyone checks the board.

The Engines: What Cloud Agents Actually Do

Cloud agents are a step change from local agent runs. They execute in isolated virtual machines with no dependency on a developer having an IDE open. The agent clones the repository, reads the codebase, writes and tests code, opens a pull request, and notifies our team in Slack. All of that happens remotely, whether the developer is at their desk or not.

Cloud Agent Platforms

Cursor

Cloud Agents

Primary interface

Cursor IDE

Runtime

Isolated cloud VM

GitHub integration

Native (branches, PRs)

Event triggers

Linear, Slack, GitHub, PagerDuty, cron

Context memory

Session + codebase context

Notifications

Slack, GitHub PR comments

Similar
capability

Claude Code

Cloud Agents

Primary interface

CLI + Claude.ai web

Runtime

Isolated cloud VM

GitHub integration

Native (branches, PRs)

Event triggers

Scheduled routines + event-triggered runs

Context memory

Session + codebase context

Notifications

Slack, GitHub PR comments

Cursor cloud agents

Cursor cloud agents integrate natively with Linear, GitHub, and the project codebase. Long-horizon work is handled through self-summarization: the agent periodically compresses its working context so it can maintain coherence across multi-hour sessions without losing the thread of what it was doing. The 35% autonomous PR rate reflects how reliably this holds up in production.

Claude Code cloud agents

Claude Code cloud agents offer a parallel capability set: clone the repo, understand context, write code, open PRs, post to Slack. The primary interface is the Claude Code CLI or the Claude.ai web app rather than an IDE. Teams already invested in Cursor tend to stay there. Teams that prefer a CLI-first or cross-editor workflow get the same cloud agent capability from Anthropic directly. Both are production-viable today. The pipeline is identical: repository to code to PR to notification. The entry point differs.

The most important practical consequence of cloud agents is asynchronous development. A developer's attention is no longer the bottleneck for any individual task. The agent works while the developer reviews other PRs, handles architecture decisions, or ends their day. The work does not wait.

The Triggers: Automations, Routines, and Always-On Memory

Cloud agents run the task. Automations and Routines decide when the task starts. This is the layer that turns a capable agent into an always-on team member.

Event-Driven Agent Triggers

Event Sources

Linear Issue Created

PR Merged (GitHub)

Slack Message Received

PagerDuty Alert

Scheduled Cron

Automation

/ Routine

Agent Tasks

Bug Fix

Feature Work

Test Generation

Dependency Audit

Code Review Comment

Cursor Automations

A Cursor Automation is a definition with three parts: a trigger event, an agent prompt describing the task, and a set of constraints on what the agent is allowed to do. When the trigger fires, the automation launches the agent with those parameters. Trigger options include: a Linear issue created, a PR merged on GitHub, a Slack message matching a pattern, a PagerDuty alert crossing a threshold, and scheduled cron expressions. The automation runs without any developer involvement beyond the initial setup.

Claude Code Routines

Claude Code Routines follow the same model. A routine defines a trigger or schedule, the agent task to execute, and the context constraints. Routines can be defined via the Claude Desktop App, Claude Code CLI or the Claude.ai web interface and persist across sessions with accumulated codebase context. Where Automations are configured in the Cursor interface, Routines live in the Claude Code configuration and can be version-controlled alongside the project.

Memory compounds the value

Both platforms accumulate knowledge across runs. An agent that has touched a codebase 20 times understands its conventions, its naming patterns, and the decisions made in past PRs in a way that a fresh context window cannot replicate. By run 20, agents on our stack produce PRs that require meaningfully less review time than they did at run one. The improvement is real and it compounds.

A few examples from our current stack: a scheduled weekly SEO audit that opens a PR with results and technical fixes (if applicable), a Linear bug-label automation that runs an initial triage analysis and posts findings as a PR comment before any developer looks at the ticket, and an on-PR-open routine that checks test coverage for the changed files and flags gaps in the PR description.

We've helped a marketing team create weekly routines to analyze their client's site content, specific industry trends, and recommend weekly blog posts to address gaps in their content strategy in order to maximize SEO.

Where the Pipeline Lives Now

The SDLC we operate today has three characteristics that did not exist six months ago. Work starts from system signals rather than developer decisions. Multiple agents work in parallel rather than sequentially. And human time has shifted almost entirely to review, architecture, and escalation handling.

Parallel Agent Fleet

Linear Issue

Bug Fix AgentWorking

Linear Issue

Feature AgentWorking

Cron Schedule

Dependency AuditWorking

Review QueuePRs waiting

Human ReviewApprove / redirect

Merge + DeployTests pass, ships

Parallel agent fleets

From Ticket to Deployed: Our Agentic AI Development Workflow described running three tasks in parallel as a capacity gain. With event-driven automations and routines, the practical number of concurrent tasks is not three: it is whatever the review queue can absorb. On an active project day we routinely have ten or more agent runs completing within the same review window. The constraint is not execution capacity. The constraint is how many PRs a developer can thoughtfully review in a day.

Orchestration patterns

We structure the queue with a few explicit rules. Bug reports and regressions get a higher priority trigger than feature work, which means they start sooner and appear earlier in the review queue. Concurrency limits per repository prevent ten agents from racing to modify the same files at once. And we maintain a list of ticket types that require a developer to start the work manually: any task touching authentication, payment flows, or security-sensitive configuration. Automations handle the routine; humans handle the consequential.

Where developers spend time now

Reviewing agent PRs is faster than reviewing human PRs. Agents produce scoped diffs with clear descriptions and explicit uncertainty flags. A typical review takes five to fifteen minutes rather than forty-five. The developer time that used to go into writing the code now goes into evaluating it. Architecture decisions that span multiple agent outputs, edge cases the agent flagged but could not resolve, and the occasional escalation where the agent explicitly asked for human judgment: these are where senior engineering time concentrates.

Governance at Agent Scale

When agents run continuously without a developer present, governance is not a nice-to-have. It is what keeps the pipeline reliable. We added five layers of control as we scaled from occasional agent runs to an always-on fleet.

Governance Layers

Merge Approval + Deploy

One human sign-off, then ships

PR Review + Escalation Flags

Agent surfaces decisions it was uncertain about

Automated Test Gate

CI must pass — no exceptions

Policy Constraints

What the agent is and is not allowed to touch

Agent Output

Raw code changes on a feature branch

Widest = baseNarrowest = final gate

Policy constraints

Every automation and routine we run includes a constraints block that specifies what the agent is not allowed to touch. Authentication modules, payment integrations, environment configuration files, and database migration scripts are off-limits for autonomous agents. The agent can read these files for context but cannot modify them. Any task that would require touching a constrained area is escalated to the queue for human assignment.

Circuit breakers

If a test suite fails on agent-authored PRs more than three times in a rolling seven-day window for the same automation, that automation pauses and posts a Slack alert. This prevents a flaky test or a misconfigured prompt from generating an endless stream of failing PRs. Someone looks at what went wrong, adjusts the automation or the prompt, and re-enables it. The circuit breaker is the difference between a noisy pipeline and an unmanageable one.

Cost controls

Claude Code Routines support per-run token budgets that cap how much compute a single run can consume. Cursor enforces per-agent spend limits at the workspace level. Both platforms support daily spend cap alerts delivered via Slack. We review the weekly cost report the same way we review test pass rates: as a signal that tells us whether the automation is working efficiently or needs tuning.

Observability

Every agent run produces a structured log. Cursor sessions include a recording that shows the full sequence of steps the agent took. Each PR includes a diff that is the authoritative record of what changed. We do a weekly review of the logs and recordings for automations that are more than two weeks old. Drift happens gradually: an agent that was producing clean PRs in week one can develop subtle inconsistencies by week six. The review catches this before it becomes a problem.

Escalation signals

Every agent we run is configured to flag uncertainty explicitly. Each pull request description includes a section called "Reviewer notes" where the agent calls out decisions it was not confident about, alternatives it considered, and any areas where it made an assumption that a reviewer should verify. Human reviewers check this section first. An agent that knows what it does not know is far more useful than one that presents every decision with equal confidence.

What This Does to the Team

Running three tasks in parallel felt like a step change. That is no longer the ceiling. At fleet scale with event triggers, the realistic throughput is an order of magnitude higher: ten to twenty agent runs completing within a single review session. The math on what a team can ship per sprint changes substantially.

Review load reality

More PRs land in the review queue each day. But each review is faster. Agent-authored PRs are scoped: they address the stated issue, they have clear descriptions, and they include explicit uncertainty flags that tell the reviewer exactly where to focus. Review time per PR is down. Total PRs reviewed per day is up. The net effect is more work shipped per developer-day, not more exhaustion.

Quality signals to watch

Test coverage trend. Autonomous agents should not decrease it. A downward trend is an early warning that agents are writing code that bypasses tests.
Revert rate. Track agent-authored PRs that get reverted within 48 hours of merging. A rising revert rate means the review process is not catching something it should.
Escalation frequency. How often agents flag uncertainty in their Reviewer notes. A sudden spike usually means a codebase area changed in a way the automation prompt did not anticipate.
Automation hit rate. What percentage of triggered events produce a PR that merges without significant revision. This is the headline metric for whether an automation is tuned correctly.

What this means for client delivery

For clients, the change shows up as shorter iteration cycles and more predictable velocity. The bottleneck in the new model is review capacity, which is plannable. A developer can review roughly eight to twelve agent PRs per day without sacrificing depth. That number is consistent. Planning a sprint around it produces reliable outcomes. Implementation time in the old model varied by task complexity; review time in the new model is much more uniform.

Getting Started: Pilot, Systematize, Automate

The teams doing this well in 2026 are not the ones who adopted every feature at once. They are the ones who piloted one automation, measured it honestly, and expanded from there.

Adoption Roadmap

Phase 0130-day trial

Pilot

One automation, one repo
One trigger type
Measure quality metrics

Unlocks: Governance layer

Phase 02Team rollout

Systematic

Multiple trigger types
Governance constraints added
Team trained on agent PRs

Unlocks: Fleet orchestration

Phase 03Always-on

Autonomous

Fleet orchestration active
Always-on memory-backed runs
Routine maintenance automated

Choosing a platform

If the team is already working in Cursor IDE daily, Cursor cloud agents and Automations are the natural entry point. The tooling is already in context and the configuration surfaces are familiar. If the team is CLI-comfortable or uses multiple editors, Claude Code cloud agents and Routines offer the same pipeline with a configuration model that lives alongside the codebase. Both are production-ready. Starting with one and proving the model is more valuable than debating which to choose.

Prerequisites worth checking first

Test coverage as a real gate. If your test suite is sparse or unreliable, automated tests cannot serve as the quality gate they need to be. Fix this before turning on automations.
Linear ticket hygiene. Agents work from the ticket description. Vague acceptance criteria produce vague PRs. A short audit of how tickets are written often reveals the fastest path to better agent output.
Review culture ready for throughput. A team that struggles to clear a backlog of human PRs will not suddenly find capacity for agent PRs. Review capacity is the new constraint; it needs to be planned for explicitly.

The three-phase progression

Phase 1 (Pilot) is one automation on one repository with one trigger type. Run it for 30 days. Measure the automation hit rate, the revert rate, and the review time. Use those numbers to tune the prompt and the constraints before expanding. Phase 2 (Systematic) adds governance: policy constraints, circuit breakers, cost controls, and team training on how to review agent PRs efficiently. Multiple trigger types go live only after governance is in place. Phase 3 (Autonomous) is the fleet: multiple automations and routines across multiple repositories, always-on memory, and a weekly observability review as the primary maintenance cadence.

The Background Team Is Already Running

Not long ago, every agent run started with a developer pressing go. That is no longer the default. Cloud agents and automation layers from both Cursor and Claude Code mean the engineering team that works while your developers sleep is no longer a hypothetical. It is a configuration decision.

The teams getting ahead of this right now are not the ones with the most agents. They are the ones with the clearest governance. Policy constraints, test gates, escalation paths, and cost controls are what separate a reliable always-on pipeline from a noisy one. The technology is ready. The question is whether the process around it is.

If you want to understand where your team sits on that spectrum and what a sequenced adoption looks like for your specific stack, the AI and Tech Strategy Consultation assessment is built exactly for this. Reach out and we can map it out together.

Tagged:Agentic AI AI Automation Azure CI/CD Claude Code Cursor DevOps Engineering Event-Driven GitHub Governance Linear Productivity Slack

Work With Us

Have a project in mind?

We build the web's most demanding applications. Let's talk about yours.

Get in Touch