OpenJarvis: Personal AI, On Personal Devices

Jon Saad-Falcon; Avanika Narayan; Robby Manihani; Tanvir Bhathal; Herumb Shandilya; Hakki Orhun Akengin; Gabriel Bo; Andrew Park; Matthew Hart; Caia Costello; Chuan Li; Christopher Ré; Azalia Mirhoseini

doi:10.48550/arXiv.2605.17172

OpenJarvis: Personal AI, On Personal Devices

Jon Saad-Falcon* Stanford

Avanika Narayan* Stanford University

Robby Manihani Stanford University

Tanvir Bhathal Stanford University

Herumb Shandilya Stanford University

Hakki Orhun Akengin Stanford University

Gabriel Bo Stanford University

Andrew Park Stanford University

Matthew Hart Stanford University

Caia Costello Stanford University

Chuan Li Stanford University

Christopher Ré Stanford University

Azalia Mirhoseini Stanford University

Preprint, 2026

DOI: 10.48550/arXiv.2605.17172

OpenJarvis decomposes the personal AI stack into five typed primitives (Intelligence, Engine, Agents, Tools & Memory, Learning) and uses LLM-guided spec search to close the local-cloud accuracy gap to within 3.2 pp on average while reducing marginal API cost ~800x and end-to-end latency 4x — running entirely on-device at inference time.

Abstract

Personal AI stacks, like OpenClaw and Hermes Agent, are becoming central to daily work, yet they route nearly every query (often over sensitive local data) to cloud-hosted frontier models. Replacing frontier models with local models inside existing stacks does not work: swapping Claude Opus 4.6 for Qwen3.5-9B drops accuracy by 25-39 pp across personal AI tasks like PinchBench and GAIA. Existing stacks bundle agentic prompts, tool descriptions, memory configuration, and runtime settings around a specific cloud model. Only the prompts can be tuned, and state-of-the-art prompt optimizers close just 5 pp of the local-cloud gap on their own. This motivates a decomposed personal AI stack: one that exposes individual primitives which can be optimized individually or jointly to close the local-cloud gap. We present OpenJarvis, an architecture that represents a personal AI system as a typed spec over five primitives: Intelligence, Engine, Agents, Tools & Memory, and Learning. Each primitive is an independently editable field, making the stack end-to-end optimizable and measurable against accuracy, cost, and latency. Towards closing the local-cloud gap without surrendering local-model properties, OpenJarvis introduces LLM-guided spec search, a local-cloud collaboration in which frontier cloud models propose edits across the spec at search time, only non-regressing edits are accepted, and the resulting spec runs entirely on-device at inference time. With LLM-guided spec search, on-device specs match or exceed cloud accuracy on 4 of 8 benchmarks and land within 3.2 pp of the best cloud baseline on average. They also reduce marginal API cost by ~800x and end-to-end latency by 4x.

Scaling Intelligence Lab

OpenJarvis: Personal AI, On Personal Devices

Abstract

Materials

Bibtex