Skills
Email & Tools Official

Agent Browser

Built-in web browsing for the container agent. Open pages, fill forms, take screenshots, and extract data.

What it does

  • Navigate to any URL, go back/forward, reload
  • Snapshot the page to get interactive elements with clickable refs
  • Fill forms, click buttons, select dropdowns, upload files
  • Take screenshots and export pages as PDF
  • Extract text, HTML, attributes, and element counts
  • Save and restore authentication state across sessions

What you'll need

  • NanoClaw installed and running

Install

Built-in

How it works

Every NanoClaw container comes with agent-browser, a CLI tool built on Playwright that gives the agent full browser automation. The agent uses it to browse the web, interact with pages, and extract information — without needing a display server or GUI.

The core workflow is simple: navigate to a page, take a snapshot to see what’s there, and interact using element references. A snapshot returns an accessibility tree where each interactive element has a ref like @e1, @e2. The agent clicks, fills, and reads elements by ref, then re-snapshots after the page changes.

This is a built-in capability — there’s no skill to install and no configuration needed. The agent decides when to use the browser based on the task.

What the agent can do

Research and reading. The agent opens URLs, reads page content, and extracts specific data. It can navigate through multi-page results, follow links, and pull text or attributes from any element.

Form interaction. The agent fills inputs, checks boxes, selects dropdown options, and submits forms. It can handle multi-step flows like login pages, checkout processes, or multi-part forms.

Screenshots and PDFs. The agent captures screenshots of full pages or visible areas, and can export pages as PDFs. Useful when the user needs a visual record of something on the web.

Authentication. The agent can log into sites and save the session state to a file. On future runs, it loads the saved state and skips the login flow. This works across container restarts as long as the state file is in a mounted directory.

Limitations

  • The browser runs headless inside a container. There’s no way to see it in real time.
  • Sites with aggressive bot detection (CAPTCHAs, Cloudflare challenges) may block automated access.
  • JavaScript-heavy SPAs work, but the agent may need to wait for elements to appear after navigation.
  • The browser has no access to your host machine’s browser profile, cookies, or saved passwords.

Tips

  • The agent uses the browser proactively when a task would benefit from it — you don’t need to say “use the browser.” Asking “what’s the weather in Tokyo” or “fill out the form on that page” is enough.
  • For sites that require login, the agent can save auth state and reuse it. If you use a particular site often, the first session handles login and subsequent sessions skip it.
  • Screenshots are saved inside the container. If you need them persisted, make sure the output directory is in a mounted path.