Integrating Human Rescue with Browser Use and Playwright
Browser Use is one of the most popular frameworks for building AI browser agents in Python. It wraps Playwright with an LLM-driven control loop: the agent sees the page, decides what to do, executes actions, and repeats. It works remarkably well for many tasks -- until it hits something it cannot handle.
This guide covers where Browser Use agents get stuck, how to add human-in-the-loop rescue using Pilot's Python integration, and what happens on the operator side when an agent asks for help.
Where Browser Use Agents Get Stuck
Browser Use agents make decisions by looking at the page (via screenshot or DOM extraction) and asking an LLM what to do next. This works for navigating sites, filling forms, and extracting data. But several categories of problems are outside the LLM's ability to solve:
- CAPTCHAs -- the agent can see the CAPTCHA but cannot solve it. Clicking on "I am not a robot" sometimes works for reCAPTCHA v2, but image selection tasks and Turnstile challenges are beyond current models.
- Two-factor authentication -- the agent sees "Enter the code from your authenticator app" but has no access to the phone or hardware key.
- Complex login flows -- OAuth redirects through multiple domains, SSO portals with company-specific configurations, or login pages that require specific cookie state.
- Anti-bot detection -- Cloudflare challenge pages, WAF blocks, and behavioral fingerprinting that detect Playwright's automation signatures.
- Infinite loops -- the agent keeps retrying the same action because the page is not changing (often due to a hidden overlay or modal the agent cannot see or interpret).
In each case, the Browser Use agent enters a failure loop: it tries an action, the page does not change as expected, and it tries again with minor variations until it runs out of retries or context window.
Adding Pilot to a Browser Use Agent
Pilot provides a Python plugin that adds a request_human_help tool to the agent's available actions. When the LLM decides it needs help, it calls this tool, which triggers a human rescue session.
Installation
Basic Integration
from pilotapp import create_pilot_tools
from langchain_openai import ChatOpenAI
# Create Pilot tools with your API key
pilot_tools = create_pilot_tools(
api_key="pk_your_key",
server_url="https://pilotapp.dev"
)
# Create the Browser Use agent with Pilot tools included
agent = Agent(
task="Log into example.com and download the monthly report",
llm=ChatOpenAI(model="gpt-4o"),
additional_tools=pilot_tools,
)
# Run the agent -- it will call request_human_help if stuck
result = await agent.run()
With this setup, the agent's LLM now has access to a tool called request_human_help. The agent can decide to call it whenever it determines that it cannot proceed on its own. The LLM's system prompt is augmented to explain when this tool should be used.
How request_human_help Works
When the agent calls request_human_help, the following sequence happens:
- The agent provides a description of what it is stuck on, e.g., "I see a CAPTCHA that I cannot solve" or "The site is asking for a 2FA code."
- Pilot extracts the CDP URL from the Playwright browser instance. This is the WebSocket endpoint that allows remote connection to the live browser session.
- An API call is made to Pilot's rescue endpoint with the CDP URL and the description.
- A human operator connects to the browser session. They see exactly what the agent sees and can interact with the page normally.
- The operator solves the blocker (completes the CAPTCHA, enters a 2FA code, dismisses a modal, etc.) and marks the task as done.
- The API call returns with a success/failure result, and the agent continues from the new page state.
The entire process is synchronous from the agent's perspective. It calls the tool, waits, and gets back control of a browser that is now past the blocker.
What the Operator Sees
On the operator side, Pilot provides a web-based interface that shows incoming rescue requests. Each request includes:
- The agent's description of the problem
- A live view of the browser session via CDP
- Full mouse and keyboard control over the browser
- A button to mark the task as solved or unsolvable
The operator does not need to install anything or have access to the agent's code. They interact with the browser through their web browser, solve the problem, and move on to the next request.
Configuring the Agent's Behavior
You can control when the agent asks for help by providing additional context in the task description or system prompt:
task="""
Log into example.com with the provided credentials.
If you encounter a CAPTCHA, 2FA prompt, or any
blocker you cannot handle, use request_human_help
immediately. Do not retry more than twice before
asking for help.
""",
llm=ChatOpenAI(model="gpt-4o"),
additional_tools=pilot_tools,
)
This matters because LLMs have a tendency to keep retrying failed actions rather than asking for help. Explicit instructions to escalate early save time and LLM token costs.
Direct Playwright Integration (Without Browser Use)
If you are using Playwright directly without Browser Use, you can call Pilot's API from your automation script at any point where the agent detects a blocker:
from playwright.async_api import async_playwright
async def rescue_with_pilot(cdp_url: str, description: str) -> dict:
"""Call Pilot's rescue endpoint directly."""
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://pilotapp.dev/rescue",
json={
"cdp_url": cdp_url,
"description": description,
},
headers={"Authorization": "Bearer pk_your_key"},
timeout=300.0, # 5 minute timeout
)
return resp.json()
# In your Playwright script:
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(cdp_url)
page = browser.contexts[0].pages[0]
# ... navigate and interact ...
# Agent detects a CAPTCHA
captcha = await page.query_selector('iframe[src*="recaptcha"]')
if captcha:
result = await rescue_with_pilot(cdp_url, "CAPTCHA on login page")
if result["solved"]:
# Continue automation
await page.wait_for_selector(".dashboard")
Browser Requirements
For Pilot to connect to your browser session, the browser needs to be accessible via a CDP WebSocket URL. This works out of the box with:
- Cloud browser providers -- Browserbase, Browserless, and similar services expose CDP URLs by default.
- Self-hosted Chrome -- launch Chrome with
--remote-debugging-port=9222and expose the port. - Playwright's CDP mode -- use
browser.connect_over_cdp()instead ofbrowser.launch()for remote browsers.
Local browsers launched via playwright.chromium.launch() create a local CDP endpoint that is not accessible from the internet. For Pilot to work, the browser must be running somewhere network-accessible (a cloud VM, a container, or a cloud browser service).
Error Handling and Timeouts
Rescue requests can fail for several reasons, and your agent should handle each case:
if result["solved"]:
# Success -- page is past the blocker
pass
elif result["error"] == "timeout":
# No operator picked it up in time -- retry or skip
pass
elif result["error"] == "unsolvable":
# Operator saw it but could not solve it
pass
elif result["error"] == "browser_died":
# Browser session was lost -- need to restart
pass
For production systems, wrap the rescue call in retry logic with exponential backoff. A timeout usually means no operator was available -- retrying after a short wait often succeeds when an operator becomes free.
When to Use Browser Use vs. Direct Playwright
Browser Use is the right choice when your task requires adaptive navigation -- the agent needs to figure out how to interact with pages it has never seen before. The LLM decides what to click, what to type, and how to navigate.
Direct Playwright scripting is better for well-defined, repeatable flows where you know exactly which selectors to use and what sequence of actions to perform. It is faster, cheaper (no LLM calls), and more reliable for known workflows.
In both cases, human-in-the-loop handles the same category of problems: blockers that neither code nor an LLM can resolve on their own. The integration approach differs (tool registration for Browser Use, direct API call for Playwright), but the underlying rescue mechanism is identical.