Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning
Researchers at Stanford University and Lambda Labs, have revealed the research paper for OpenJarvis, an open-source framework that runs inference, brokers, reminiscence, and studying solely on-device.
The open-weight fashions configured via OpenJarvis land inside 3.2 proportion factors of the most effective cloud mannequin on common, at roughly 800× decrease marginal API price per question and roughly 4× decrease latency below the analysis’s benchmark protocol. This analysis work builds on the analysis group’s earlier Intelligence Per Watt study, which reported that native fashions already deal with 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence effectivity bettering 5.3× from 2023 to 2025.
Model Overview & Access
OpenJarvis will not be a single mannequin. It is a framework that composes any supported mannequin with a configurable agent stack, evaluated throughout 11 native fashions from 4 households.
| Property | Value |
|---|---|
| License | Apache 2.0 |
| Framework launch | March 12, 2026 |
| Paper | arXiv:2605.17172 (posted May 16, 2026) |
| Repository | github.com/open-jarvis/OpenJarvis |
| Stars / forks | ~5.4k / ~1.2k (June 2026) |
| Languages | Python (~83%), Rust (~9%), TypeScript (~7%) |
| Evaluated fashions | 11 native fashions throughout 4 households: Qwen3.5, Gemma4, Nemotron, Granite |
| Cloud baselines | Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro |
| Supported engines | Ollama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (amongst others) |
| Context window | Model-dependent |
| Installation | Single command; ~3 minutes on broadband |
| Hardware | Tested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark |
Architecture: Five Primitives and a Spec
OpenJarvis decomposes a private AI system into 5 typed primitives, composed via a single declarative configuration object known as a spec.
- Intelligence — the mannequin, weights, technology parameters, and quantization format.
- Engine — the inference runtime (Ollama, vLLM, SGLang, and so forth.), batching, KV-cache settings, and {hardware} path.
- Agents — the reasoning loop (ReAct or CodeAct), system prompts, tool-use coverage, and flip limits.
- Tools & Memory — exterior interfaces, retrieval backends, 25+ knowledge connectors, and 32+ messaging channels, with native MCP assist and interchangeable reminiscence backends.
- Learning — the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search.
Each primitive is independently swappable, and a spec serializes all 5 right into a TOML file. Two specs can share the identical agent and software configuration and differ solely in mannequin and engine, so the identical conduct runs on a Mac Mini and a workstation with out rewriting prompts.
LLM-guided spec search is the second contribution. It is a neighborhood–cloud collaboration: a frontier cloud mannequin acts as a trainer at search time, studying traces, diagnosing failure clusters, and proposing edits throughout Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted provided that it improves the goal failure cluster with out inflicting significant regressions elsewhere — the analysis group calls this the gate (default tolerance 1%). The optimized spec then runs solely on-device at inference time, with zero cloud calls. The trainer is used solely at search time; at 100 queries per day, the amortized trainer price falls under $0.001 per question inside six months.
Prior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and immediate optimizers alone recuperate solely about 5 pp of the cloud–native hole. LLM-guided spec search recovers 13–32 pp as a result of it edits throughout primitives collectively, at 7–11× decrease optimization price than single-primitive baselines. The four-primitive transfer area contributes 5.5–16.5 pp, and the LLM proposer provides about 10 pp on common over an evolutionary search on the similar transfer area.

Capabilities & Performance
OpenJarvis was evaluated throughout 8 benchmarks spanning 508 duties: software calling (ToolCall-15), agentic workflows (PinchBench), coding (StayCodeBench), customer support (τ-Bench V2, τ²-Bench Telecom), basic help (GAIA), and deep analysis (StayResearchBench, DeepResearchBench).
The swap check: Replacing the supposed cloud mannequin with Qwen3.5-9B in current frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the identical mannequin below an OpenJarvis spec, the residual drop shrinks to five.6–16.5 pp — recovering 56–77% of the portability loss.
The accuracy frontier: The finest single native mannequin, Qwen3.5-122B, reaches 80.3% common accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp hole. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, StayCodeBench, and τ-Bench V2.
Cost and latency: Local configurations kind the accuracy–effectivity frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per question, versus $0.009 per question for Claude Opus 4.6 — an roughly 800× marginal API-cost benefit. End-to-end latency drops by roughly 4× on the agentic workloads, although the paper notes single-shot prompts can favor cloud serving.
Search beneficial properties: LLM-guided spec search improves the Qwen3.5-9B scholar to 100% on PinchBench, 83% on StayCodeBench, and 91% on StayResearchBench. Across the total eight-benchmark suite, common beneficial properties per scholar mannequin vary from 13.1 to 31.5 pp. The authors report that these beneficial properties survive their robustness checks (reward-weight variants, search-seed variance, and random restarts).
How to Use it
Installation is one command. On macOS, Linux, or WSL2:
curl -fsSL https://open-jarvis.github.io/OpenJarvis/set up.sh | bash
Windows customers run an equal PowerShell script (irm … | iex). The installer provisions uv, a Python digital atmosphere, Ollama, and a starter mannequin in about three minutes on broadband. A desktop GUI ships as a .dmg, .exe, .deb, .rpm, or .AppImage from the releases web page.
After set up, jarvis begins a chat session. Starter presets cowl widespread workflows:
jarvis init --preset morning-digest-mac # day by day briefing with TTS
jarvis init --preset deep-research # multi-hop analysis with citations
jarvis init --preset code-assistant # agent with code execution and shell entry
jarvis init --preset scheduled-monitor # stateful agent on a schedule
The framework ships with eight built-in brokers throughout three execution modes — on-demand, scheduled, and steady. It connects to 25+ knowledge sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes brokers over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others).
Skills will be imported from exterior catalogs — about 150 from Hermes Agent and about 13,700 group abilities from OpenClaw — all following the agentskills.io specification. A jarvis optimize abilities --policy dspy command refines them from native hint historical past.
Marktechpost’s Visual Explainer
01 / 07
Key Takeaways
- OpenJarvis runs inference, brokers, reminiscence, and studying absolutely on-device, touchdown inside 3.2 pp of the most effective cloud mannequin at ~800× decrease marginal API price and ~4× decrease latency.
- A typed "spec" decomposes the stack into 5 swappable primitives — Intelligence, Engine, Agents, Tools & Memory, and Learning — serialized to transportable TOML.
- LLM-guided spec search makes use of a frontier cloud mannequin as a search-time trainer to recuperate 13–32 pp of the cloud–native hole at 7–11× decrease optimization price, then runs domestically with zero cloud calls.
- Local specs match or exceed cloud on 4 of 8 benchmarks (ToolCall-15, PinchBench, StayCodeBench, τ-Bench V2); the remaining hole concentrates on reasoning- and research-heavy duties.
Check out the Paper and Repo. Also, be at liberty to observe us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us
The put up Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning appeared first on MarkTechPost.
