Human-in-the-Loop for AI Browser Agents
AI browser agents are good at navigating predictable flows: filling forms, clicking buttons, extracting data from structured pages. But the web is full of unpredictable interruptions -- CAPTCHAs, login walls, cookie consent dialogs, 2FA prompts, age gates, "are you still there" modals, and page layouts that have changed since the agent was last tested.
Human-in-the-loop (HITL) is the pattern where an agent recognizes it is stuck, pauses, hands control to a human, and resumes after the human resolves the blocker. This guide covers when you need HITL, how to architect it, and the trade-offs between building it yourself and using an existing service.
When Browser Agents Need Humans
Not every agent failure needs a human. Some failures can be retried, some can be worked around programmatically, and some are permanent. HITL is the right tool when:
- CAPTCHAs that resist automated solving -- Cloudflare Turnstile, advanced reCAPTCHA v3, and behavioral challenges that require a real browser interaction.
- Two-factor authentication -- SMS codes, push notifications, and hardware security keys require a person with access to the physical device.
- Login and identity verification -- sites that require manual credential entry, security questions, or identity document upload.
- Unexpected UI changes -- a site redesign broke the agent's selectors, and a human can navigate the new layout while you update the code.
- Rate limiting and soft blocks -- some sites present "please try again later" or queueing pages that a human can wait through or bypass.
- Complex decision points -- the agent reaches a fork it was not trained for and needs human judgment about which path to take.
The common thread is situations where the agent lacks either the capability (solving a CAPTCHA) or the authority (choosing between ambiguous options) to proceed on its own.
Architecture Patterns
There are several ways to implement HITL for browser agents, from simple to sophisticated.
Pattern 1: Pause and Notify
The simplest approach. The agent detects it is stuck, sends a notification (Slack, email, PagerDuty), and enters a polling loop. A human opens the browser manually -- typically through a VNC session or remote desktop -- solves the problem, and signals the agent to continue.
Pros: Simple to build. No special infrastructure needed. Works with any browser setup.
Downsides: The human needs VNC/RDP access to wherever the browser is running. Notification-to-resolution latency is high (minutes to hours). No structured handoff -- the human has to figure out what is wrong by looking at the screen.
Pattern 2: CDP-Based Session Sharing
The agent exposes the browser's Chrome DevTools Protocol (CDP) endpoint. When it gets stuck, a human connects to that same browser session via CDP from their own machine, solves the problem, and disconnects. The agent detects the page state has changed and continues.
Pros: The human interacts with the actual browser session, not a screen share. No VNC infrastructure. Works with cloud browsers (Browserbase, Browserless) that already expose CDP URLs.
Downsides: You need to build the connection handoff, the notification system, the UI for the human operator, and the state detection to know when the human is done.
Pattern 3: Managed HITL Service
A third-party service handles the entire workflow: the agent makes an API call with its browser session, the service connects a human operator, the human solves the blocker, and the API call returns with the result.
This is what Pilot does. One API call, blocking, with a timeout:
apiKey: 'pk_your_key'
});
// Agent detects it is stuck
const result = await pilot.rescue(page, 'Cloudflare challenge on target site');
if (result.solved) {
// Page is now past the blocker -- continue automation
await page.waitForSelector('.dashboard');
} else {
// result.error: "unsolvable" | "timeout" | "browser_died"
console.log('Rescue failed:', result.error);
}
Pros: Minimal integration effort. No operator UI to build or maintain. The service handles operator availability and assignment.
Downsides: Per-solve cost. Dependency on a third party. Not suitable if you have strict data residency requirements that prohibit external access to the browser session.
Build vs. Buy
The decision comes down to volume and complexity. Here is a realistic comparison:
| Consideration | Build It Yourself | Use a Service (e.g. Pilot) |
|---|---|---|
| Integration time | 1-3 weeks for a basic system | Under an hour |
| Operator management | You recruit, train, and schedule operators | Handled by the service |
| Operator UI | Build a web app with CDP viewer | Included |
| Notification system | Build Slack/email/pager integration | Included |
| Cost at 50 solves/month | Engineering time + operator wages | ~$49/month |
| Cost at 5000 solves/month | Amortized -- potentially cheaper | Higher, but predictable |
| Data control | Full control | Third party sees browser session |
For most teams, the math favors using a service until volume justifies the engineering investment of building in-house. A working HITL system requires not just the technical plumbing but a reliable pool of human operators available when agents get stuck -- which is often outside business hours.
Detection: Knowing When to Escalate
The hardest part of HITL is often not the human handoff itself but detecting that the agent needs help. Good detection strategies:
- Expected state assertions -- after each action, verify the page is in the expected state. If it is not after a timeout, the agent is stuck.
- Known blocker patterns -- check for CAPTCHA iframes, 2FA input fields, login forms, and common modal selectors after each navigation.
- LLM-based assessment -- take a screenshot and ask the agent's LLM "is this page showing a blocker? If so, what kind?" This catches novel blockers that DOM selectors would miss.
- Progress monitoring -- if the agent has not completed a step in N seconds, assume it is stuck and escalate.
async function checkForBlockers(page) {
// Check known CAPTCHA selectors
const captcha = await page.$(
'iframe[src*="recaptcha"], .cf-turnstile, .h-captcha'
);
if (captcha) return { stuck: true, reason: 'captcha' };
// Check for 2FA prompts
const twoFa = await page.$(
'input[name="otp"], .two-factor-prompt'
);
if (twoFa) return { stuck: true, reason: '2fa' };
// Check for unexpected login page
const url = page.url();
if (url.includes('/login') || url.includes('/signin')) {
return { stuck: true, reason: 'login_required' };
}
return { stuck: false };
}
Designing for Graceful Degradation
A well-designed HITL system should degrade gracefully when a human is not available or the rescue times out:
- Retry logic -- if the human solve times out, retry once before marking the task as failed.
- Partial progress saving -- before escalating, save whatever data the agent has collected so far. If the rescue fails, you do not lose all progress.
- Task queuing -- if the blocker is not time-sensitive, queue the task for retry later rather than failing immediately.
- Fallback paths -- can the agent achieve its goal via a different route that avoids the blocker? Check alternative approaches before escalating to a human.
The goal is to make human intervention a routine part of the agent's execution model rather than an exceptional error path. Agents that treat HITL as a normal capability, like "click" or "type," are more resilient in production than agents that assume they can handle everything autonomously.