Pre-release · v0.1 closed alpha

Your AI, on your hardware.

Selfweave is a privacy-first personal AI. Local inference, a private memory of your work, and — when you want it — distributed compute: trusted LAN devices today, a cooperative WAN you help build by contributing. Weave yourself into the network.

Request access How it works

No telemetry Encrypted at rest Source-available

selfweave · coding

// project context automatically indexed
fn quality_score(obs: &Observation) -> f64 {
    let lex = lexical_overlap(obs);
    let novelty = info_novelty(obs);
    let syco = sycophancy_penalty(obs);
    (1.0 - syco) * (lex + novelty) / 2.0
}

Three pillars

Personal AI, built for sovereignty.

Selfweave separates what must stay private from what can be shared. You decide where the line sits — and how much of yourself you weave into the network.

On-device personal AI

Local LLM inference via llama.cpp. Private indexing of your code, documents, and photos. A personal adapter trained from your patterns, never leaving the device.

Cooperative compute

Run larger models by pooling VRAM across devices. Selfweave's roadmap extends this from LAN peers you choose to a cooperative pool of strangers — governed by attestation tiering, Sybil defense, and contribution-based incentives. The trust-and-economics layer is the differentiator, not the splitting itself.

Federated intelligence

An optional shared behavioural layer, trained via SPARTA + DiLoCo. Your gradients contribute to a collective intelligence without ever exposing your raw data.

Architecture

A hard boundary between you and the network.

Selfweave is built in four layers. Personal data lives above the encryption boundary and never crosses it in the clear. Distributed operations happen below it — with cryptographic guarantees, not promises.

Event-sourced state — every change is reversible, auditable, portable.
AES-256-GCM at rest, derived from your OS-managed key material.
Backend subprocesses sandboxed with Job Objects + AppContainer on Windows.
Cryptographic erasure — destroy the key, destroy the data.

Your data

corpus, indices, LoRA adapter

Local runtime

llama.cpp, Tauri, Python sidecar

— encryption boundary —

Network primitives

pairing, split inference, relay

Federation

SPARTA gossip, DiLoCo rounds

Behavior Intelligence Layer

A shared brain that never sees you.

The BIL is a small LoRA adapter — about 8–50 MB — that encodes aggregate human behaviour patterns: when people focus, how communication tone shifts across contexts, what people typically need at different times of day. It sits between the frozen base model and your private personal adapter. You get the benefit of the collective without the collective ever seeing your data.

The technique is borrowed, not invented. OpenFedLLM showed federated LoRA exchanging 0.06% of parameters per round can outperform a frontier model in a domain (Ye et al., 2024). Selfweave uses 0.1% via SPARTA sparse aggregation, with DiLoCo-style outer-loop synchronisation from DeepMind's distributed-training work and noise-budgeting from (ε, δ)-differential privacy (Dwork & Roth, 2014). Status: roadmap — cryptographic primitives ship in v0.2; federation activates when the peer-defense stack lands in v0.3.

SPARTA sparsity. Roughly 4,200 noised floats (~17 KB) leave your device per round. The full adapter is non-reconstructible from any subset of fragments.
(ε, δ)-differential privacy. ε = 1.0, δ = 1/(2n). The same standard as Apple iOS analytics and academic federated-healthcare research.
Defense in depth. Gradient clipping bounds any single example's influence; Gaussian noise applied on-device before transmission; a moments accountant tracks cumulative privacy budget; central DP added at the coordinator.
Open parameters. ε, δ, σ, clipping norm and aggregation code are all source-available. Independent researchers can verify the math — not take our word for it.
What an attacker can't do with the shared weights: determine whether you participated, recover any training example, identify any personal detail, or attribute behaviour to an individual.

Top

Personal adapter

private · ~8–50 MB · never shared

Mid

Behavior Intelligence Layer

federated · ~8–50 MB · DP-noised · SPARTA

Base

Foundation model

frozen · 2–4 GB INT4 · identical for everyone

— stacked at inference time —

W + (α/r)·B_shared·A_shared + (α/r)·B_personal·A_personal

Parity

From one device to a network. Honest about today, ambitious about tomorrow.

Selfweave's capacity story made concrete: per-user experience as the network grows, and what one peer's contribution covers. Privacy and cost are Selfweave properties at every scale.

Per-user experience

Config	Model	Decode tok/s	Context	Cost / Mtok	Privacy	Phase
1-node RTX 3080 Ti	Llama 3.1 8B Q4	89.7	128k	$0	Local	shipped
1-node + speculative	8B Q4 + 1B draft	~160–225 (proj.)	128k	$0	Local	v0.1.x
3-node LAN	14B Q4 split	~12 (proj.)	32k	$0	LAN-trusted	v0.2.5
10-node cooperative pool	70B Q4 sharded	~10–30 (proj.)	128k	$0 + contribute	Attestation-tiered	v0.3+
100-node mature pool	DeepSeek V3 (671B / 37B-MoE)	similar per-user	128k+	$0 + contribute	Attestation-tiered	v0.3+ mature
1k-node mature pool	405B+ dense	usable, large aggregate	128k+	$0 + contribute	Attestation-tiered	v0.3+ mature
ChatGPT (GPT-5)	proprietary	135.8	128k	$1.25 / $10.00	Provider-side	now
Claude Sonnet 4.6	proprietary	45.3	1M	$3.75 / $15.00	Provider-side	now
DeepSeek V3 (hosted)	671B / 37B-MoE	~34 (provider median)	128k	$0.40 / $0.89	Provider-side	now

Agentic-loop performance — stacked, not chosen between

Agentic loops compound WAN RTT across both LLM steps and tool I/O. Naïve split inference is poorly suited; the five layers below stack rather than substitute. Numbers are per-user tok/s on a 70B-class model over a Petals-class swarm (tech-spec §4.8).

Stack level	Technique	Effective per-user tok/s	Status
0 — baseline	Naïve split inference	~5	—
1 — affinity	KV cache session affinity	~8–10	design (RT-008)
2 — token spec	Token-level speculative decoding	~15–20	wire-up shipped
3 — step spec	Step-level lookahead reasoning	~30–50	design
4 — batching	Cross-loop batching at swarm peers	same per-user, 5–10× per peer	v0.2.5
5 — overlap	Tool-I/O ⇄ next-step prefill overlap	hides tool latency	design

Network capacity at scale — by parameter tier and phase

Capacity grows along three axes: peer count, model hidden-state size (smaller is better — egress is the bottleneck per tech-spec §4.6, not compute), and the RT-009 bandwidth-efficiency lever stack (INT4 activation quantisation, adaptive compression at slow links, cross-loop batching at swarm peers, tier stratification). Cells show users servable @ 5% concurrency with the §4.7 30–60% jitter discount baked in — a contributing peer still covers more user-pipeline work than its own household consumes. The 405B+ dense row is honest about being structurally bandwidth-heavy: the same swarm serves an order of magnitude more users on MoE-class models. Numbers are projections, not measurements.

Model class (hidden state)	Phase 3 launch v0.3 · ~100 peers · no levers	Phase 3 + RT-009 initial v0.3.x · ~1k peers · ~5× stack	Phase 4 mature, full RT-009 v0.4+ · ~10k peers · ~10× stack
8B (8 KB)	~120–240	~6k–12k	~120k–240k
14B (10 KB)	~95–190	~4.7k–9.5k	~95k–190k
70B (16 KB, baseline)	~60–120	~3k–6k	~60k–120k
DeepSeek V3 (14 KB, 671B / 37B-MoE)	~70–140	~3.5k–7k	~70k–140k
405B+ dense (32 KB)	~30–60	~1.5k–3k	~30k–60k

Sources: 1-node decode from a community llama.cpp benchmark on RTX 3080 Ti running Llama 3.1 8B Q4_K_M (localscore.ai). Speculative-decoding 1.8–2.5× from inference-speedups design §3.1 (projected). Multi-node figures derived from _config/tech-spec.md §4.6–4.8: per-peer egress ceiling, swarm capacity model, agentic-loop workaround stack. Effective tok/s discounts theoretical peak by 30–60% per the §4.7 jitter / churn / tail-latency caveat — Petals in production runs at ~6 tok/s/user, so this discount is empirical, not pessimistic. Phase-3 lever multipliers (~5× initial, ~10× full stack) reflect cross-loop batching at swarm peers (§4.8 layer 4: same per-user latency, ~5–10× per-peer throughput) compounded with INT4 activation quantisation (~2× egress per token-pass) and AdaTopK adaptive compression at slow links — all tracked under RT-009; benchmarks pending, ranges are conservative midpoints. Cloud numbers from artificialanalysis.ai and vendor pricing pages, May 2026. Projections labelled (proj.) are design targets, not measurements.

Headline feature

A coding assistant that knows your project — not your future.

Selfweave exposes an OpenAI-compatible API and a local editor proxy with project-scoped RAG. Your codebase is indexed on-device; completions, fill-in-the-middle, and chat all run against the same private context.

No cloud round-trip. No prompts mined for training. No telemetry of what you type.

VS Code
Continue
Cline
Zed
Neovim
JetBrains

POST /v1/chat/completions

{
  "model": "selfweave-local",
  "messages": [
    {"role": "user",
     "content": "refactor auth.rs"}
  ],
  // project RAG injected transparently
  "stream": true
}

Features

Quietly capable. Explicitly honest.

Selfweave is in active development. Shipped features work today on Windows; roadmap items are being built in the open.

Coding assistance Shipped

Completions, fill-in-the-middle, embeddings. Project-scoped RAG with a debounced file watcher. OpenAI-compatible, editor-agnostic.

Research mode Shipped

Query your own document library with citation-grounded responses. Widened search params, structured five-element prompt.

Semantic search Shipped

Dual FAISS indices — CLIP for photos, MiniLM for text. Search your life in natural language, offline.

Anti-sycophancy Shipped

Honesty is measurable. Lexical overlap, agreement markers, and a benchmark harness gate every release. No yes-men.

Backend sandboxing Shipped

Three profiles — Strict, Basic, Permissive. Windows Job Objects + AppContainer + restricted tokens. Consent modal before elevation.

Function calling Shipped

Tag-based tool use with a registry + executor. Extend the assistant with local tools — no remote webhook required.

LAN split inference Roadmap

Pool VRAM across household devices via QR-paired peers. Pipeline-parallel split over libp2p Noise transport — the same primitive that extends to the cooperative pool when the trust layer ships.

Cooperative compute Roadmap

Contribute idle GPU to the network, earn priority routing. Contribution-based incentive, no tokens, no speculation.

Federated BIL Roadmap

Shared behavioural intelligence trained across users via SPARTA + DiLoCo. You get the benefit of the collective without sharing the data.

Privacy, by architecture

What stays on your device stays on your device.

These are not promises — they are properties of the system.

Zero telemetry.No analytics SDK, no crash beacons, no anonymised usage pings. If it's not in the event log, it doesn't exist.
No cloud model calls.Every inference runs on your hardware. No OpenAI, Anthropic, or Google in the path. LAN peer pairing is opt-in and on the roadmap.
Right-to-erasure by design.Every corpus encrypts with a key you can destroy, turning the event log into ciphertext garbage. The "Delete all my data" UX ships before first public release (GDPR Art. 17).
No microphone, no camera.Selfweave rejects ambient sensors by policy. Your screen and your files are enough context; your bedroom isn't.
Source-available core.The library behind Selfweave is inspectable. Audit the code paths, verify the sandbox, run your own build.
Post-quantum ready.At-rest encryption is already PQ-safe. Asymmetric primitives will migrate to hybrid ML-KEM / ML-DSA as the network layer activates.

Research foundations

Built on published work, not made up.

Selfweave is an integration of peer-reviewed research and proven open-source projects, not a stack of hopes. Each capability below traces back to published work — verifiable, attributable, replicable.

Federated LoRA

OpenFedLLM (Ye et al., 2024) — federated LoRA exchanging 0.06% of parameters per round outperformed GPT-4 in financial-domain benchmarks. Selfweave uses 0.1% via SPARTA sparse aggregation.

Distributed training

DiLoCo (DeepMind) and INTELLECT-1 / INTELLECT-2 (Prime Intellect, arXiv 2505.07291, 2025) — 32B-parameter distributed RL across continents on consumer-class infrastructure. Validates Selfweave's post-v1.0 pre-training direction.

Differential privacy

(ε, δ)-DP (Dwork & Roth, 2014). ε = 1.0 matches Apple iOS analytics and academic federated-healthcare research — strong enough to be a meaningful guarantee, loose enough for the BIL to actually learn.

Speculative decoding

EAGLE-3 (Li et al., arXiv 2503.01840, 2025) and Speculative Streaming (Bhendawade et al., arXiv 2402.11131, 2024). Selfweave wires llama.cpp's --model-draft mechanism for lossless decode speedup.

Step-level lookahead

Lookahead Reasoning (ICLR 2026) — orthogonal to token-level speculation; both stack. Aimed at agentic loops where reasoning steps, not tokens, are the speculation unit.

Split inference at scale

Petals (Borzunov et al., arXiv 2312.08361, 2023) and Parallax (Gradient Network, arXiv 2509.26182, 2025). Pipeline-parallel inference over commodity links — Selfweave inherits the pattern, adds attestation tiering and Sybil defense.

Behavioural ontology

HBCP / BCIO (Mac Aonghusa & Michie, 2020) — the Human Behaviour-Change Project's ontology. Selfweave's 30-50-concept behavioural map mirrors its role: a unified pipeline-friendly representation of diverse natural-language behaviours.

Behaviour intelligence

Artificial Behaviour Intelligence (Jo et al., arXiv 2505.03315, 2025) — formalises behaviour understanding in cultural and situational context, with calibrated uncertainty. Direct inspiration for the BIL's context modelling.

Sycophancy gating

ELEPHANT benchmark (Cheng et al., arXiv 2505.13995, 2025) — measures social sycophancy in LLM responses. Selfweave runs a derivative harness before every adapter release and base-model upgrade.

Pricing

The local AI is free. The convenience is optional.

Selfweave's core features run forever on your own hardware at no cost. Paid tiers cover managed conveniences we can't run on your laptop.

Free

$0 / forever

The complete local AI. Your hardware, your rules.

Coding assistance + editor proxy
Local RAG + research mode
LAN pairing across your devices (Phase 2)
Bring-your-own web search key (Phase 2)
Contribute compute, earn priority (Phase 3)

Start free

Plus

$8 / month

End-to-end sync + managed conveniences. Privacy intact.

Everything in Free
5 GB E2E-encrypted sync (corpus, LoRA, personas)
500 managed web searches / mo
Curated prompt & persona library
Priority support

Join waitlist

Pro

$12 / month

Everything in Plus, at professional scale.

Everything in Plus
50 GB sync + 2,000 searches / mo
WAN compute priority — no contribution required (Phase 3)
Persona Studio + experimental features
Early access to new protocols

Join waitlist

Honest roadmap

Built in the open, one phase at a time.

Selfweave is a long-term solo project. Progress is steady, not flashy. Here's what's real and what's next.

Phase 1 · v0.1

Shipping now

Local LLM inference via llama.cpp
Coding assistance + editor proxy
Project-scoped RAG
Photo + document indexing
Research mode with citations
Anti-sycophancy benchmark

Phase 2 · v0.2

In progress

Backend sandboxing (Windows shipped)
Backend binary hash-pinning (shipped)
CUDA-compatible Strict sandbox (shipped)
LAN pairing with QR codes
TLS certificate pinning
Cross-device E2E sync (Plus)

Phase 3 · v0.3+

On the horizon

Cooperative compute pool (attestation + Sybil defense)
WAN distributed inference
Federated BIL via SPARTA + DiLoCo
Peer-defense stack (Byzantine, integrity audit)
Post-quantum handshakes
Personal-LoRA pipeline over your corpus

Early access

Own your intelligence.

Selfweave is in closed pre-release. Join the waitlist for a build, a changelog entry per week, and a direct line to the developer. Then, when you're ready, weave yourself into the network.

Request access Read the architecture