Key takeaway
Want the short version? Skip down for a concise summary.
In April we described a workflow where a developer assigns a ticket, walks away for a coffee, and comes back to a pull request. That was still a human-started process. Things have quickly changed in just a couple of months. Cursor shipped cloud agents and Automations. Anthropic shipped Claude Code managed agents and Routines. Both platforms moved the trigger point from "developer presses go" to "an event in the system presses go."
That shift touches every layer of the development process: how cloud agents and automation layers work on each platform, how we added governance when agents started running without us, and what the event-driven software development lifecycle (SDLC) looks like for a team ready to operate at this level. The background engineering team is no longer theoretical.
From Assign-and-Wait to Always-On
The April 2026 workflow was already fast. A developer assigned a ticket, an agent worked through it, and a pull request appeared in under an hour. But the developer still started every run. That is the part that changed.
Cursor shipped cloud agents alongside a new Automations feature. Anthropic shipped cloud agents for Claude Code alongside Routines. Both platforms landed in the same window with the same fundamental shift: agents that wake up on their own, triggered by events in your existing tools rather than by a developer opening an IDE.
Q1 2026
Developer-Initiated
Human starts every run
June 2026
Event-Driven
System event starts the run
The internal numbers from Cursor underscore how quickly this model matures in production. More than 35% of merged PRs at Cursor itself now come from autonomous agents running without a developer present. That is not a benchmark from a controlled experiment. That is the share of shipped code that started from an event, not a human.
The practical implication is straightforward. Work that used to wait in the queue until a developer had capacity now starts the moment a trigger condition is met. A Linear or GitHub issue is created, a PagerDuty alert fires, a PR merges, and an agent is already picking up the task before anyone checks the board.
The Engines: What Cloud Agents Actually Do
Cloud agents are a step change from local agent runs. They execute in isolated virtual machines with no dependency on a developer having an IDE open. The agent clones the repository, reads the codebase, writes and tests code, opens a pull request, and notifies our team in Slack. All of that happens remotely, whether the developer is at their desk or not.
Cloud Agent Platforms
Cursor
Cloud AgentsPrimary interface
Cursor IDE
Runtime
Isolated cloud VM
GitHub integration
Native (branches, PRs)
Event triggers
Linear, Slack, GitHub, PagerDuty, cron
Context memory
Session + codebase context
Notifications
Slack, GitHub PR comments
capability
Claude Code
Cloud AgentsPrimary interface
CLI + Claude.ai web
Runtime
Isolated cloud VM
GitHub integration
Native (branches, PRs)
Event triggers
Scheduled routines + event-triggered runs
Context memory
Session + codebase context
Notifications
Slack, GitHub PR comments
Cursor cloud agents
Cursor cloud agents integrate natively with Linear, GitHub, and the project codebase. Long-horizon work is handled through self-summarization: the agent periodically compresses its working context so it can maintain coherence across multi-hour sessions without losing the thread of what it was doing. The 35% autonomous PR rate reflects how reliably this holds up in production.
Claude Code cloud agents
Claude Code cloud agents offer a parallel capability set: clone the repo, understand context, write code, open PRs, post to Slack. The primary interface is the Claude Code CLI or the Claude.ai web app rather than an IDE. Teams already invested in Cursor tend to stay there. Teams that prefer a CLI-first or cross-editor workflow get the same cloud agent capability from Anthropic directly. Both are production-viable today. The pipeline is identical: repository to code to PR to notification. The entry point differs.
The most important practical consequence of cloud agents is asynchronous development. A developer's attention is no longer the bottleneck for any individual task. The agent works while the developer reviews other PRs, handles architecture decisions, or ends their day. The work does not wait.
The Triggers: Automations, Routines, and Always-On Memory
Cloud agents run the task. Automations and Routines decide when the task starts. This is the layer that turns a capable agent into an always-on team member.
Event-Driven Agent Triggers
Event Sources
Automation
/ Routine
Agent Tasks
Cursor Automations
A Cursor Automation is a definition with three parts: a trigger event, an agent prompt describing the task, and a set of constraints on what the agent is allowed to do. When the trigger fires, the automation launches the agent with those parameters. Trigger options include: a Linear issue created, a PR merged on GitHub, a Slack message matching a pattern, a PagerDuty alert crossing a threshold, and scheduled cron expressions. The automation runs without any developer involvement beyond the initial setup.
Claude Code Routines
Claude Code Routines follow the same model. A routine defines a trigger or schedule, the agent task to execute, and the context constraints. Routines can be defined via the Claude Desktop App, Claude Code CLI or the Claude.ai web interface and persist across sessions with accumulated codebase context. Where Automations are configured in the Cursor interface, Routines live in the Claude Code configuration and can be version-controlled alongside the project.
Memory compounds the value
Both platforms accumulate knowledge across runs. An agent that has touched a codebase 20 times understands its conventions, its naming patterns, and the decisions made in past PRs in a way that a fresh context window cannot replicate. By run 20, agents on our stack produce PRs that require meaningfully less review time than they did at run one. The improvement is real and it compounds.
A few examples from our current stack: a scheduled weekly SEO audit that opens a PR with results and technical fixes (if applicable), a Linear bug-label automation that runs an initial triage analysis and posts findings as a PR comment before any developer looks at the ticket, and an on-PR-open routine that checks test coverage for the changed files and flags gaps in the PR description.
We've helped a marketing team create weekly routines to analyze their client's site content, specific industry trends, and recommend weekly blog posts to address gaps in their content strategy in order to maximize SEO.
Where the Pipeline Lives Now
The SDLC we operate today has three characteristics that did not exist six months ago. Work starts from system signals rather than developer decisions. Multiple agents work in parallel rather than sequentially. And human time has shifted almost entirely to review, architecture, and escalation handling.
Parallel Agent Fleet
Parallel agent fleets
From Ticket to Deployed: Our Agentic AI Development Workflow described running three tasks in parallel as a capacity gain. With event-driven automations and routines, the practical number of concurrent tasks is not three: it is whatever the review queue can absorb. On an active project day we routinely have ten or more agent runs completing within the same review window. The constraint is not execution capacity. The constraint is how many PRs a developer can thoughtfully review in a day.
Orchestration patterns
We structure the queue with a few explicit rules. Bug reports and regressions get a higher priority trigger than feature work, which means they start sooner and appear earlier in the review queue. Concurrency limits per repository prevent ten agents from racing to modify the same files at once. And we maintain a list of ticket types that require a developer to start the work manually: any task touching authentication, payment flows, or security-sensitive configuration. Automations handle the routine; humans handle the consequential.
Where developers spend time now
Reviewing agent PRs is faster than reviewing human PRs. Agents produce scoped diffs with clear descriptions and explicit uncertainty flags. A typical review takes five to fifteen minutes rather than forty-five. The developer time that used to go into writing the code now goes into evaluating it. Architecture decisions that span multiple agent outputs, edge cases the agent flagged but could not resolve, and the occasional escalation where the agent explicitly asked for human judgment: these are where senior engineering time concentrates.
Governance at Agent Scale
When agents run continuously without a developer present, governance is not a nice-to-have. It is what keeps the pipeline reliable. We added five layers of control as we scaled from occasional agent runs to an always-on fleet.
Governance Layers
Merge Approval + Deploy
One human sign-off, then ships
PR Review + Escalation Flags
Agent surfaces decisions it was uncertain about
Automated Test Gate
CI must pass — no exceptions
Policy Constraints
What the agent is and is not allowed to touch
Agent Output
Raw code changes on a feature branch
Policy constraints
Every automation and routine we run includes a constraints block that specifies what the agent is not allowed to touch. Authentication modules, payment integrations, environment configuration files, and database migration scripts are off-limits for autonomous agents. The agent can read these files for context but cannot modify them. Any task that would require touching a constrained area is escalated to the queue for human assignment.
Circuit breakers
If a test suite fails on agent-authored PRs more than three times in a rolling seven-day window for the same automation, that automation pauses and posts a Slack alert. This prevents a flaky test or a misconfigured prompt from generating an endless stream of failing PRs. Someone looks at what went wrong, adjusts the automation or the prompt, and re-enables it. The circuit breaker is the difference between a noisy pipeline and an unmanageable one.
Cost controls
Claude Code Routines support per-run token budgets that cap how much compute a single run can consume. Cursor enforces per-agent spend limits at the workspace level. Both platforms support daily spend cap alerts delivered via Slack. We review the weekly cost report the same way we review test pass rates: as a signal that tells us whether the automation is working efficiently or needs tuning.
Observability
Every agent run produces a structured log. Cursor sessions include a recording that shows the full sequence of steps the agent took. Each PR includes a diff that is the authoritative record of what changed. We do a weekly review of the logs and recordings for automations that are more than two weeks old. Drift happens gradually: an agent that was producing clean PRs in week one can develop subtle inconsistencies by week six. The review catches this before it becomes a problem.
Escalation signals
Every agent we run is configured to flag uncertainty explicitly. Each pull request description includes a section called "Reviewer notes" where the agent calls out decisions it was not confident about, alternatives it considered, and any areas where it made an assumption that a reviewer should verify. Human reviewers check this section first. An agent that knows what it does not know is far more useful than one that presents every decision with equal confidence.
What This Does to the Team
Running three tasks in parallel felt like a step change. That is no longer the ceiling. At fleet scale with event triggers, the realistic throughput is an order of magnitude higher: ten to twenty agent runs completing within a single review session. The math on what a team can ship per sprint changes substantially.
Review load reality
More PRs land in the review queue each day. But each review is faster. Agent-authored PRs are scoped: they address the stated issue, they have clear descriptions, and they include explicit uncertainty flags that tell the reviewer exactly where to focus. Review time per PR is down. Total PRs reviewed per day is up. The net effect is more work shipped per developer-day, not more exhaustion.
Quality signals to watch
- Test coverage trend. Autonomous agents should not decrease it. A downward trend is an early warning that agents are writing code that bypasses tests.
- Revert rate. Track agent-authored PRs that get reverted within 48 hours of merging. A rising revert rate means the review process is not catching something it should.
- Escalation frequency. How often agents flag uncertainty in their Reviewer notes. A sudden spike usually means a codebase area changed in a way the automation prompt did not anticipate.
- Automation hit rate. What percentage of triggered events produce a PR that merges without significant revision. This is the headline metric for whether an automation is tuned correctly.
What this means for client delivery
For clients, the change shows up as shorter iteration cycles and more predictable velocity. The bottleneck in the new model is review capacity, which is plannable. A developer can review roughly eight to twelve agent PRs per day without sacrificing depth. That number is consistent. Planning a sprint around it produces reliable outcomes. Implementation time in the old model varied by task complexity; review time in the new model is much more uniform.
Getting Started: Pilot, Systematize, Automate
The teams doing this well in 2026 are not the ones who adopted every feature at once. They are the ones who piloted one automation, measured it honestly, and expanded from there.
Adoption Roadmap
- One automation, one repo
- One trigger type
- Measure quality metrics
Unlocks: Governance layer
- Multiple trigger types
- Governance constraints added
- Team trained on agent PRs
Unlocks: Fleet orchestration
- Fleet orchestration active
- Always-on memory-backed runs
- Routine maintenance automated
Choosing a platform
If the team is already working in Cursor IDE daily, Cursor cloud agents and Automations are the natural entry point. The tooling is already in context and the configuration surfaces are familiar. If the team is CLI-comfortable or uses multiple editors, Claude Code cloud agents and Routines offer the same pipeline with a configuration model that lives alongside the codebase. Both are production-ready. Starting with one and proving the model is more valuable than debating which to choose.
Prerequisites worth checking first
- Test coverage as a real gate. If your test suite is sparse or unreliable, automated tests cannot serve as the quality gate they need to be. Fix this before turning on automations.
- Linear ticket hygiene. Agents work from the ticket description. Vague acceptance criteria produce vague PRs. A short audit of how tickets are written often reveals the fastest path to better agent output.
- Review culture ready for throughput. A team that struggles to clear a backlog of human PRs will not suddenly find capacity for agent PRs. Review capacity is the new constraint; it needs to be planned for explicitly.
The three-phase progression
Phase 1 (Pilot) is one automation on one repository with one trigger type. Run it for 30 days. Measure the automation hit rate, the revert rate, and the review time. Use those numbers to tune the prompt and the constraints before expanding. Phase 2 (Systematic) adds governance: policy constraints, circuit breakers, cost controls, and team training on how to review agent PRs efficiently. Multiple trigger types go live only after governance is in place. Phase 3 (Autonomous) is the fleet: multiple automations and routines across multiple repositories, always-on memory, and a weekly observability review as the primary maintenance cadence.
The Background Team Is Already Running
Not long ago, every agent run started with a developer pressing go. That is no longer the default. Cloud agents and automation layers from both Cursor and Claude Code mean the engineering team that works while your developers sleep is no longer a hypothetical. It is a configuration decision.
The teams getting ahead of this right now are not the ones with the most agents. They are the ones with the clearest governance. Policy constraints, test gates, escalation paths, and cost controls are what separate a reliable always-on pipeline from a noisy one. The technology is ready. The question is whether the process around it is.
If you want to understand where your team sits on that spectrum and what a sequenced adoption looks like for your specific stack, the AI and Tech Strategy Consultation assessment is built exactly for this. Reach out and we can map it out together.
Work With Us
Have a project in mind?
We build the web's most demanding applications. Let's talk about yours.