Designed by Freepik
Microsoft is taking its first serious swing at making “computer-using AI” practical on everyday devices with the launch of Fara-7B, a compact agentic model designed to operate a computer much like a human would. After rolling out the Phi family of small language models earlier this year, the company is now expanding into a new class of AI, one that can click, scroll, type, navigate websites, and complete tasks directly on a screen.
Unlike most AI tools that stop at generating text, Fara-7B goes several layers deeper. It can see a webpage, interpret what’s on it, and interact through predicted mouse and keyboard actions. And even with just 7 billion parameters, it performs competitively with far larger systems that typically depend on multiple models and cloud-heavy workflows.
Because it’s small enough to run locally, Fara-7B offers faster responses, improved privacy, and genuine device-level autonomy, all without sending user data off the machine.
Microsoft is positioning Fara-7B as an experimental, hands-on agent for people who want to explore AI that handles everyday online tasks.
Think of activities like:
- Filling in forms
- Searching for information
- Comparing prices
- Booking trips or reservations
- Managing accounts online
The company strongly recommends running it in a sandboxed environment, keeping an eye on what it’s doing, and avoiding sensitive workflows. It’s still a research preview, not a polished consumer product.
How Fara-7B Works
Fara-7B doesn’t rely on hidden metadata or accessibility layers. It simply views screenshots of the browser, just as a human would, and decides what to do next.
Its training relied heavily on a large synthetic dataset generated from real websites and user-like tasks. Microsoft created a multi-agent system that:
- Proposed tasks using public URLs across categories (shopping, travel, movies, restaurants, etc.).
- Solved the tasks using coordinated agents that planned, explored pages, and simulated user input.
- Verified results to ensure only successful, accurately completed demonstrations were used.
In total, Microsoft trained Fara-7B on 145,000 interaction trajectories containing 1 million individual steps, along with additional data for grounding UI elements, captioning, and visual QA.
Instead of using this multi-agent setup at runtime, the team distilled all of those behaviors into a single model. Fara-7B was trained with a supervised “observe → think → act” pattern, predicting both reasoning and actions at each step. It’s built on Qwen2.5-VL-7B, chosen for its strong grounding abilities and long-context support (up to 128k tokens).
Performance and Limitations
Microsoft evaluated Fara-7B on established benchmarks like WebVoyager, Mind2Web, and Deepshop, and introduced a new benchmark, WebTailBench, to cover real-world tasks that existing tests overlook, such as booking tickets, making reservations, comparing prices, or applying for jobs.
In an independent Browserbase evaluation using human annotators, Fara-7B scored 62% on WebVoyager under standardized conditions.
While impressive for its size, the model still shares the limitations seen in many web-agent systems: occasional instruction errors, hallucinations, and difficulty with complex, multi-step tasks.
Fara-7B marks Microsoft’s first substantial move toward shrinking computer-use agents into something lightweight enough to run locally yet capable enough to handle meaningful workflows. It’s early, experimental, and imperfect, but as developers test it, push it, and uncover its edges, Fara-7B could help shape the next wave of practical, everyday AI agents.
