AI Agents & Orchestration

Browser-Native Linux Agents

Last updated 2026-07-31

What's new

2026-07-31

Cursor (a tool that lets you talk to an AI to build apps) can now create a simple app in under two minutes, like a calorie tracker, using a model called GPT 5.6 Soul (a type of AI).
The Cursor app has a new "agents view" (a window where you talk to the AI) and workspaces (folders to organize big projects) to help manage different chat sessions (conversations with the AI).
You can download Cursor from cursor.com, and it works on both Mac and other computers, with a simple sign-in process.
Cursor's updates make it easier to build and manage apps, even for beginners, with features like pinning and renaming chat sessions.

2026-07-16

OpenAI's Codex (a tool for generating code) and Anthropic's Claude (a chatbot) now have in-app browsers that can open multiple tabs, letting you manage tasks like business, marketing, and content creation in one place.
These AI tools can act as "agents" (helpers that perform tasks for you), opening and managing different browser tabs based on your requests, like showing YouTube analytics, emails, or even sports scores.
Claude and Codex can now access regular websites, not just local ones, and you can ask them to open relevant links quickly, changing how you use computers for tasks.
Both tools have annotation features (like Excalidraw, a drawing tool) that let you mark up web pages and communicate with the AI, and Codex can even write directly into Google Docs using its in-app browser.

2026-07-13

Clay (a service that finds and enriches business contact info) and Cloud Code (an AI assistant) can work together to find leads, enrich their data, and create personalized outreach messages.
Cloud Code can understand and execute natural language instructions, like "find 50 leads that match this description," making it easy to use without learning new software interfaces.
Clay uses a "waterfall" approach, checking multiple data sources to find the best email addresses, increasing success rates from around 30% to 80-90%.
To create effective personalized messages, Cloud Code needs context about your business, like case studies, FAQs, and website copy, which you can provide in advance.

2026-07-01

**Domain-specific agents** (AI tools designed for specific tasks or industries) are becoming crucial, as businesses want to integrate their unique data with AI for better efficiency.
**Agents** (software that uses AI to complete tasks) are hard to build, requiring careful management of the "agentic loop" (the process the agent follows to complete tasks) and other complex challenges.
**Tools like Vercel AI SDK** (a set of pre-built code for common AI tasks) and Eve (a framework for building agents) are emerging to help simplify the agent-building process.
**Many businesses**, from small agencies to large corporations, are trying to build their own custom agents to leverage AI for significant gains.

2026-06-19

SpaceX (a company that makes rockets and other tech) bought Cursor (a tool that helps people code with AI) for $60 billion, aiming to make it a top AI platform for everyone, not just coders.
Cursor is now improving fast and could soon rival other AI tools like Codex and Claude (both are AI helpers for coding and work tasks), thanks to SpaceX's powerful computers and data.
Cursor's new features make it great for coding and general work, but it still can't create documents like Codex and Claude can, which might change soon.
SpaceX and Cursor are working together to train better AI models, with the goal of making Cursor a one-stop shop for work, competing with other AI "super apps."

2026-06-13

Web MCP (Model Context Protocol) is a new web standard that lets websites offer AI agents (software that acts on your behalf) a clear menu of tools and actions, making it easier for them to navigate and interact with the site.
To prepare for AI agents, focus on making your website accessible to everyone, using clear HTML, strong accessibility standards, and fast loading times, as this also benefits AI agents.
Web MCP improves the performance and reliability of AI agents by allowing websites to define their capabilities as structured tools, reducing the need for agents to guess or work around site features.
The Model Context Tool Inspector is a Chrome extension that shows the tools available on a website for AI agents to use, helping developers see and test how their site interacts with AI.

2026-06-10

Cloudflare has been building AI agents using "durable objects" (a tech that maintains state and runs continuously), which has proven successful and is used by companies like PayPal and Intercom.
These agents can run tasks in the background, like scheduling weekly reports, and can sync across devices with low latency (under 16ms), as seen in apps like TLDraw.
Cloudflare offers a full-stack solution for creating these agents, including tools for scheduling tasks and integrating with other services, making it easier to build and deploy AI-powered tools.
They've also developed a robust backend for Vercel's AI SDK (a set of tools for building AI applications), enhancing its capabilities for production use.

2026-06-07

Google has created a special version of Chrome DevTools (a set of tools for web developers to debug and optimize websites) designed for AI agents, helping them analyze and improve web page performance.
Instead of overwhelming agents with massive data files, Google now provides summarized, easy-to-understand performance metrics in a format called semantic summaries, making it simpler for agents to process and act on the information.
When designing tools for AI agents, it's important to consider their unique cognitive abilities and limitations, focusing on both effectiveness (completing tasks) and efficiency (using resources wisely).
These tools are compatible with various AI agent platforms, such as Gemini CLI (a command-line interface for AI agents), Cloud Code, and Open Claw (a platform for building and deploying AI agents).

2026-06-04

OpenAI's GPT 5.6 (their next major AI model) might launch soon, with test versions already appearing in ChatGPT that can generate playable games and cleaner-looking apps.
Codex, OpenAI's AI coding tool, got a big update adding plugins (add-on features) for non-coders like marketers and a "sites" feature to create shareable apps and dashboards.
The first "vibe coding" platform and benchmark (a way to compare AI models for different tasks) launched, letting you test which model works best for free for some features.

2026-06-03

Claude Code (AI assistant for coding) can automatically pull SEC filing data from government websites without you manually visiting each page.
Combine Claude Code with Superbase (cloud database), trigger.dev (automation scheduler), and Slack (messaging app) to extract, store, and monitor data daily.
For websites that block automated data pulling, browser use (an AI agent that clicks pages like a human) provides an alternative method.

Key points

What it is

Browser-native Linux agents are AI tools that run in your browser and automate web tasks from the terminal (command-line interface).
They use a snapshot command to return a compact accessibility tree (a clean list of interactive elements) instead of raw DOM data (the full HTML structure of a page).
This allows AI agents to reason about and interact with web pages using natural language commands.
Over 60 commands are available, including navigation, clicking, form filling, and taking screenshots.

How to use it

Install the tool with the terminal command `npm install -g agent-browser`.
Use the `snapshot` command to get a compact accessibility tree, which your AI agent can read and interact with.
Give your AI agent natural language instructions to open URLs, click buttons, fill forms, and take screenshots.
Integrate the tool with AI coding tools like Claude Code, Cursor, GitHub Copilot, or OpenAI Codex for seamless automation.

Watch out for

Avoid overwhelming the AI with noise by using the snapshot command instead of raw DOM data.
Don't write fragile automation scripts using CSS selectors that break easily; instead, pair custom browser scripts with reusable skills.
Always clean up sessions gracefully to prevent leaving processes out of control.

Tools named

Agent Browser (browser automation tool for AI agents), Playwright (a tool for web testing and automation)

Lesson 1: What is Browser-Native Linux Agents and why it matters

Browser-native Linux agents are AI tools that run inside your browser and let you automate web tasks directly from the terminal. Instead of writing complex scripts, you simply tell your AI agent to run a command, and it controls the browser for you. This matters because older tools like Puppeteer were built for humans writing deterministic scripts, not for AI agents that think in natural language.

The key innovation is the snapshot command, which returns a compact accessibility tree instead of dumping thousands of tokens of raw DOM data. That tree gives AI agents a clean, structured interface they can actually reason about, turning the entire web into interactive tooling. There are over 60 commands, including navigation, clicking, form filling, dragging, uploading, taking screenshots, and even network interception.

This approach works with any AI coding tool that can run shell commands, including Claude Code, Cursor, GitHub Copilot, and OpenAI Codex. You set it up with one command—`npx` followed by the tool name—and you're automating in 30 seconds. The tool is open source with an Apache 2.0 license and has over 15,000 GitHub stars, signaling it's becoming standard infrastructure rather than a niche tool.

Browser-native Linux agents solve a fundamental problem: AI agents don't think in CSS selectors. They need browser automation built from the ground up for the age of AI, which is exactly what this tool provides.

Sources

Lesson 2: How to use Browser-Native Linux Agents: step-by-step

How to Use Browser-Native Linux Agents Step by Step

First, install the tool with a single terminal command: `npm install -g agent-browser`. This gives you a CLI tool (command-line interface that runs in your terminal) that lets AI agents control a browser using simple commands—no complicated setup scripts needed.

The key command is `snapshot`. Instead of dumping the entire messy DOM (the full HTML structure of a webpage), it returns a compact accessibility tree (a clean list of buttons, links, and inputs). Your AI agent reads this snapshot, sees what it can interact with, and picks an action.

Here’s a concrete 5-step workflow. Tell your agent to: 1) open a URL, 2) take a snapshot, 3) click a specific labeled button, 4) fill in a form field, and 5) take a screenshot for verification. That’s it. Compare this to old tools like Puppeteer, which required launching a browser, creating a context, opening a page, waiting for selectors—many more steps.

This tool works with any AI system that can run shell commands (instructions typed in the terminal), such as Claude Code, Cursor, GitHub Copilot, or OpenAI Codex. To make it permanent, add the core workflow instructions to your `claude.md` or `agents.md` file so your agent always knows the commands. The skill updates automatically.

Practical uses include quality assurance testing (having the agent click through a multi-page form to find bugs), automating repetitive tasks like downloading reports from sites without APIs, and monitoring web pages. You give your agent natural language instructions—it handles the browser automation itself.

Sources

Lesson 3: Best practices and pitfalls

When using browser-native Linux agents (AI that controls a browser directly on Linux), the biggest pitfall is overwhelming the AI with noise. Traditional browser automation dumps the entire DOM (the full HTML structure of a page), which can be 15,000 tokens of garbage. The AI burns through its entire context window just to find a login button. A best practice is to use a snapshot command that returns a compact accessibility tree (a simplified list of interactive elements) instead, so the agent only sees what it can click or type.

Another common mistake is writing fragile automation scripts. Using CSS selectors that break every time a site updates, or relying on old web driver protocols, leads to constant maintenance. Instead, pair custom browser scripts with a skill (a reusable, repeatable action) to make the process consistent. For example, have the agent write its own Playwright CLI scripts for testing. This allows the AI to automate QA by spinning up headed or headless browsers (browsers with or without a visible window) and testing different parts of an app simultaneously.

The engineering genius comes from treating browser control as a CLI tool—one command, one action, no boilerplate. A fast Rust-based CLI avoids the overhead of JavaScript daemons. Best practice: always clean up sessions gracefully rather than force-killing, which can leave processes out of control. Fork your session with a slash command to branch off experiments without breaking the main task. For side queries, use a quick question command while the agent keeps working, preventing friction.

Sources