Integrating Human Rescue with Browser Use and Playwright

Updated April 2026

Browser Use is one of the most popular frameworks for building AI browser agents in Python. It wraps Playwright with an LLM-driven control loop: the agent sees the page, decides what to do, executes actions, and repeats. It works remarkably well for many tasks -- until it hits something it cannot handle.

This guide covers where Browser Use agents get stuck, how to add human-in-the-loop rescue using Pilot's Python integration, and what happens on the operator side when an agent asks for help.

Where Browser Use Agents Get Stuck

How Browser Use Makes Decisions

Browser Use agents make decisions by looking at the page (via screenshot or DOM extraction) and asking an LLM what to do next. This works for navigating sites, filling forms, and extracting data. But several categories of problems are outside the LLM's ability to solve:

Problems Outside the LLM's Reach

CAPTCHAs -- the agent can see the CAPTCHA but cannot solve it. Clicking on "I am not a robot" sometimes works for reCAPTCHA v2, but image selection tasks and Turnstile challenges are beyond current models.
Two-factor authentication -- the agent sees "Enter the code from your authenticator app" but has no access to the phone or hardware key.
Complex login flows -- OAuth redirects through multiple domains, SSO portals with company-specific configurations, or login pages that require specific cookie state.
Anti-bot detection -- Cloudflare challenge pages, WAF blocks, and behavioral fingerprinting that detect Playwright's automation signatures.
Infinite loops -- the agent keeps retrying the same action because the page is not changing (often due to a hidden overlay or modal the agent cannot see or interpret).

The Failure Loop Pattern

In each case, the Browser Use agent enters a failure loop: it tries an action, the page does not change as expected, and it tries again with minor variations until it runs out of retries or context window.

Adding Pilot to a Browser Use Agent

Pilot provides a Python plugin that adds a request_human_help tool to the agent's available actions. When the LLM decides it needs help, it calls this tool, which triggers a human rescue session.

Installation

pip install pilotapp

Basic Integration

from browser_use import Agent

from pilotapp import create_pilot_tools

from langchain_openai import ChatOpenAI

# Create Pilot tools with your API key

pilot_tools = create_pilot_tools(

  api_key="pk_your_key",

  server_url="https://pilotapp.dev"

)

# Create the Browser Use agent with Pilot tools included

agent = Agent(

  task="Log into example.com and download the monthly report",

  llm=ChatOpenAI(model="gpt-4o"),

  additional_tools=pilot_tools,

)

# Run the agent -- it will call request_human_help if stuck

result = await agent.run()

What the LLM Now Has Access To

With this setup, the agent's LLM now has access to a tool called request_human_help. The agent can decide to call it whenever it determines that it cannot proceed on its own. The LLM's system prompt is augmented to explain when this tool should be used.

How request_human_help Works

Step-by-Step Rescue Sequence

When the agent calls request_human_help, the following sequence happens:

The agent provides a description of what it is stuck on, e.g., "I see a CAPTCHA that I cannot solve" or "The site is asking for a 2FA code."
Pilot extracts the CDP URL from the Playwright browser instance. This is the WebSocket endpoint that allows remote connection to the live browser session.
An API call is made to Pilot's rescue endpoint with the CDP URL and the description.
A human operator connects to the browser session. They see exactly what the agent sees and can interact with the page normally.
The operator solves the blocker (completes the CAPTCHA, enters a 2FA code, dismisses a modal, etc.) and marks the task as done.
The API call returns with a success/failure result, and the agent continues from the new page state.

Synchronous from the Agent's Perspective

The entire process is synchronous from the agent's perspective. It calls the tool, waits, and gets back control of a browser that is now past the blocker.

What the Operator Sees

The Operator Interface

On the operator side, Pilot provides a web-based interface that shows incoming rescue requests. Each request includes:

The agent's description of the problem
A live view of the browser session via CDP
Full mouse and keyboard control over the browser
A button to mark the task as solved or unsolvable

Zero-Install Access for Operators

The operator does not need to install anything or have access to the agent's code. They interact with the browser through their web browser, solve the problem, and move on to the next request.

Configuring the Agent's Behavior

Instructing the Agent When to Escalate

You can control when the agent asks for help by providing additional context in the task description or system prompt:

agent = Agent(

  task="""

    Log into example.com with the provided credentials.

    If you encounter a CAPTCHA, 2FA prompt, or any

    blocker you cannot handle, use request_human_help

    immediately. Do not retry more than twice before

    asking for help.

  """,

  llm=ChatOpenAI(model="gpt-4o"),

  additional_tools=pilot_tools,

)

Why Early Escalation Saves Tokens

This matters because LLMs have a tendency to keep retrying failed actions rather than asking for help. Explicit instructions to escalate early save time and LLM token costs.

Direct Playwright Integration (Without Browser Use)

Calling the Rescue API From Playwright

If you are using Playwright directly without Browser Use, you can call Pilot's API from your automation script at any point where the agent detects a blocker:

import httpx

from playwright.async_api import async_playwright

async def rescue_with_pilot(cdp_url: str, description: str) -> dict:

  """Call Pilot's rescue endpoint directly."""

  async with httpx.AsyncClient() as client:

    resp = await client.post(

      "https://pilotapp.dev/rescue",

      json={

        "cdp_url": cdp_url,

        "description": description,

      },

      headers={"Authorization": "Bearer pk_your_key"},

      timeout=300.0,  # 5 minute timeout

    )

    return resp.json()

# In your Playwright script:

async with async_playwright() as p:

  browser = await p.chromium.connect_over_cdp(cdp_url)

  page = browser.contexts[0].pages[0]

  # ... navigate and interact ...

  # Agent detects a CAPTCHA

  captcha = await page.query_selector('iframe[src*="recaptcha"]')

  if captcha:

    result = await rescue_with_pilot(cdp_url, "CAPTCHA on login page")

    if result["solved"]:

      # Continue automation

      await page.wait_for_selector(".dashboard")

Browser Requirements

CDP WebSocket Access

For Pilot to connect to your browser session, the browser needs to be accessible via a CDP WebSocket URL. This works out of the box with:

Supported Browser Setups

Cloud browser providers -- Browserbase, Browserless, and similar services expose CDP URLs by default.
Self-hosted Chrome -- launch Chrome with --remote-debugging-port=9222 and expose the port.
Playwright's CDP mode -- use browser.connect_over_cdp() instead of browser.launch() for remote browsers.

Why Local Browsers Do Not Work

Local browsers launched via playwright.chromium.launch() create a local CDP endpoint that is not accessible from the internet. For Pilot to work, the browser must be running somewhere network-accessible (a cloud VM, a container, or a cloud browser service).

Error Handling and Timeouts

Result States to Handle

Rescue requests can fail for several reasons, and your agent should handle each case:

result = await rescue_with_pilot(cdp_url, "stuck on verification page")

if result["solved"]:

  # Success -- page is past the blocker

  pass

elif result["error"] == "timeout":

  # No operator picked it up in time -- retry or skip

  pass

elif result["error"] == "unsolvable":

  # Operator saw it but could not solve it

  pass

elif result["error"] == "browser_died":

  # Browser session was lost -- need to restart

  pass

Retry Strategy for Production

For production systems, wrap the rescue call in retry logic with exponential backoff. A timeout usually means no operator was available -- retrying after a short wait often succeeds when an operator becomes free.

When to Use Browser Use vs. Direct Playwright

Browser Use for Adaptive Navigation

Browser Use is the right choice when your task requires adaptive navigation -- the agent needs to figure out how to interact with pages it has never seen before. The LLM decides what to click, what to type, and how to navigate.

Playwright for Known, Repeatable Flows

Direct Playwright scripting is better for well-defined, repeatable flows where you know exactly which selectors to use and what sequence of actions to perform. It is faster, cheaper (no LLM calls), and more reliable for known workflows.

Same Rescue Mechanism for Both

In both cases, human-in-the-loop handles the same category of problems: blockers that neither code nor an LLM can resolve on their own. The integration approach differs (tool registration for Browser Use, direct API call for Playwright), but the underlying rescue mechanism is identical.