browser-automation
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Permissions
Risk Assessment
This skill requests 3 of 4 possible permissions. Elevated scope — ensure each permission is justified. Consider running in a sandbox.
SKILL.md
Automate browser interactions using Stagehand CLI with Claude.
First: Environment Selection (Local vs Remote)
The skill automatically selects between local and remote browser environments:
- If Browserbase API keys exist (BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID in .env file): Uses remote Browserbase environment
- If no Browserbase API keys: Falls back to local Chrome browser
- No user prompting: The selection happens automatically based on available configuration
Setup (First Time Only)
Check setup.json in this directory. If setupComplete: false:
npm install # Install dependencies
npm link # Create global 'browser' command
Commands
All commands work identically in both modes:
browser navigate <url> # Go to URL
browser act "<action>" # Natural language action
browser extract "<instruction>" ['{}'] # Extract data (optional schema)
browser observe "<query>" # Discover elements
browser screenshot # Take screenshot
browser close # Close browser
Quick Example
browser navigate https://example.com
browser act "click the Sign In button"
browser extract "get the page title"
browser close
Mode Comparison
| Feature | Local | Browserbase |
|---|---|---|
| Speed | Faster | Slightly slower |
| Setup | Chrome required | API key required |
| Stealth mode | No | Yes |
| Proxy/CAPTCHA | No | Yes |
| Best for | Development | Production/scraping |
Best Practices
- Always navigate first before interacting
- View screenshots after each command to verify
- Be specific in action descriptions
- Close browser when done
Troubleshooting
- Chrome not found: Install Chrome or use Browserbase mode
- Action fails: Use
browser observeto discover available elements - Browserbase fails: Verify API key and project ID are set
For detailed examples, see EXAMPLES.md. For API reference, see REFERENCE.md.
Why You Need browser-automation
Web browsers are where most modern work happens — but automating browser interactions has traditionally required writing complex Selenium or Playwright scripts. Browser Automation lets you control a web browser using natural language through OpenClaw, making web scraping, form filling, testing, and data extraction as simple as describing what you want.
The skill uses Stagehand CLI with Claude to translate natural language commands into browser actions. It automatically selects between a local Chrome browser and Browserbase's cloud browser based on your configuration, so it works both for local development and headless server environments.
With 546 downloads and growing, this skill is the go-to choice for anyone who needs to automate web interactions without writing browser automation code from scratch.
Common Use Cases
- Scrape data from websites using natural language descriptions
- Fill forms, click buttons, and navigate web applications automatically
- Take screenshots of web pages for documentation or testing
- Test web applications by describing user flows in plain English
- Extract structured data from web pages into usable formats
Frequently Asked Questions
Does it work without Browserbase?
Yes. If no Browserbase API keys are configured, it automatically falls back to using your local Chrome browser. Browserbase is optional for cloud/headless usage.
Why does it need shell access?
The skill needs shell access to launch and control the browser process through the Stagehand CLI. It also needs network access to interact with web pages.
Can it handle login-protected pages?
Yes. It can fill in login forms, handle authentication flows, and maintain session state across multiple page navigations.