Integrating Human Rescue with Browser Use and Playwright

Browser Use is one of the most popular frameworks for building AI browser agents in Python. It wraps Playwright with an LLM-driven control loop: the agent sees the page, decides what to do, executes actions, and repeats. It works remarkably well for many tasks -- until it hits something it cannot handle.

This guide covers where Browser Use agents get stuck, how to add human-in-the-loop rescue using Pilot's Python integration, and what happens on the operator side when an agent asks for help.

Where Browser Use Agents Get Stuck

Browser Use agents make decisions by looking at the page (via screenshot or DOM extraction) and asking an LLM what to do next. This works for navigating sites, filling forms, and extracting data. But several categories of problems are outside the LLM's ability to solve:

In each case, the Browser Use agent enters a failure loop: it tries an action, the page does not change as expected, and it tries again with minor variations until it runs out of retries or context window.

Adding Pilot to a Browser Use Agent

Pilot provides a Python plugin that adds a request_human_help tool to the agent's available actions. When the LLM decides it needs help, it calls this tool, which triggers a human rescue session.

Installation

pip install pilotapp

Basic Integration

from browser_use import Agent
from pilotapp import create_pilot_tools
from langchain_openai import ChatOpenAI

# Create Pilot tools with your API key
pilot_tools = create_pilot_tools(
  api_key="pk_your_key",
  server_url="https://pilotapp.dev"
)

# Create the Browser Use agent with Pilot tools included
agent = Agent(
  task="Log into example.com and download the monthly report",
  llm=ChatOpenAI(model="gpt-4o"),
  additional_tools=pilot_tools,
)

# Run the agent -- it will call request_human_help if stuck
result = await agent.run()

With this setup, the agent's LLM now has access to a tool called request_human_help. The agent can decide to call it whenever it determines that it cannot proceed on its own. The LLM's system prompt is augmented to explain when this tool should be used.

How request_human_help Works

When the agent calls request_human_help, the following sequence happens:

  1. The agent provides a description of what it is stuck on, e.g., "I see a CAPTCHA that I cannot solve" or "The site is asking for a 2FA code."
  2. Pilot extracts the CDP URL from the Playwright browser instance. This is the WebSocket endpoint that allows remote connection to the live browser session.
  3. An API call is made to Pilot's rescue endpoint with the CDP URL and the description.
  4. A human operator connects to the browser session. They see exactly what the agent sees and can interact with the page normally.
  5. The operator solves the blocker (completes the CAPTCHA, enters a 2FA code, dismisses a modal, etc.) and marks the task as done.
  6. The API call returns with a success/failure result, and the agent continues from the new page state.

The entire process is synchronous from the agent's perspective. It calls the tool, waits, and gets back control of a browser that is now past the blocker.

What the Operator Sees

On the operator side, Pilot provides a web-based interface that shows incoming rescue requests. Each request includes:

The operator does not need to install anything or have access to the agent's code. They interact with the browser through their web browser, solve the problem, and move on to the next request.

Configuring the Agent's Behavior

You can control when the agent asks for help by providing additional context in the task description or system prompt:

agent = Agent(
  task="""
    Log into example.com with the provided credentials.
    If you encounter a CAPTCHA, 2FA prompt, or any
    blocker you cannot handle, use request_human_help
    immediately. Do not retry more than twice before
    asking for help.
  """
,
  llm=ChatOpenAI(model="gpt-4o"),
  additional_tools=pilot_tools,
)

This matters because LLMs have a tendency to keep retrying failed actions rather than asking for help. Explicit instructions to escalate early save time and LLM token costs.

Direct Playwright Integration (Without Browser Use)

If you are using Playwright directly without Browser Use, you can call Pilot's API from your automation script at any point where the agent detects a blocker:

import httpx
from playwright.async_api import async_playwright

async def rescue_with_pilot(cdp_url: str, description: str) -> dict:
  """Call Pilot's rescue endpoint directly."""
  async with httpx.AsyncClient() as client:
    resp = await client.post(
      "https://pilotapp.dev/rescue",
      json={
        "cdp_url": cdp_url,
        "description": description,
      },
      headers={"Authorization": "Bearer pk_your_key"},
      timeout=300.0, # 5 minute timeout
    )
    return resp.json()

# In your Playwright script:
async with async_playwright() as p:
  browser = await p.chromium.connect_over_cdp(cdp_url)
  page = browser.contexts[0].pages[0]

  # ... navigate and interact ...

  # Agent detects a CAPTCHA
  captcha = await page.query_selector('iframe[src*="recaptcha"]')
  if captcha:
    result = await rescue_with_pilot(cdp_url, "CAPTCHA on login page")
    if result["solved"]:
      # Continue automation
      await page.wait_for_selector(".dashboard")

Browser Requirements

For Pilot to connect to your browser session, the browser needs to be accessible via a CDP WebSocket URL. This works out of the box with:

Local browsers launched via playwright.chromium.launch() create a local CDP endpoint that is not accessible from the internet. For Pilot to work, the browser must be running somewhere network-accessible (a cloud VM, a container, or a cloud browser service).

Error Handling and Timeouts

Rescue requests can fail for several reasons, and your agent should handle each case:

result = await rescue_with_pilot(cdp_url, "stuck on verification page")

if result["solved"]:
  # Success -- page is past the blocker
  pass
elif result["error"] == "timeout":
  # No operator picked it up in time -- retry or skip
  pass
elif result["error"] == "unsolvable":
  # Operator saw it but could not solve it
  pass
elif result["error"] == "browser_died":
  # Browser session was lost -- need to restart
  pass

For production systems, wrap the rescue call in retry logic with exponential backoff. A timeout usually means no operator was available -- retrying after a short wait often succeeds when an operator becomes free.

When to Use Browser Use vs. Direct Playwright

Browser Use is the right choice when your task requires adaptive navigation -- the agent needs to figure out how to interact with pages it has never seen before. The LLM decides what to click, what to type, and how to navigate.

Direct Playwright scripting is better for well-defined, repeatable flows where you know exactly which selectors to use and what sequence of actions to perform. It is faster, cheaper (no LLM calls), and more reliable for known workflows.

In both cases, human-in-the-loop handles the same category of problems: blockers that neither code nor an LLM can resolve on their own. The integration approach differs (tool registration for Browser Use, direct API call for Playwright), but the underlying rescue mechanism is identical.