Skip to main content

Concepts

How it works

A short tour of how NeuroBridgeEDU is put together — the desktop shell, the audio pipeline, the transcription engine, and the summary engine.

The shape of the application

NeuroBridgeEDU is a Tauri 2 desktop app. Tauri wraps a native window around a webview, with a Rust process handling everything that needs hardware access or persistence. Two consequences fall out of that shape:

  1. All sensitive operations — audio capture, file I/O, model inference, keychain access — live in Rust. The frontend never talks directly to your hardware.
  2. The frontend is fully static. There's no SSR, no API routes, no Node server. Every interaction goes through a typed Tauri command.

The pipeline, end to end:

Microphone ──┐
             ├──► Audio Pipeline (Rust)
System audio ┘     │
                   ├──► WAV file on disk
                   │
                   └──► VAD (strips silence)
                         │
                         └──► Whisper.cpp ──► transcript chunks
                                                │
                                                └──► Mistral / EuroLLM ──► summary
                                                                            │
                                                                            └──► SQLite (FTS5)

The frontend layer

  • Routing — Next.js App Router. Pages: Home, Meetings, Meeting (detail), Settings, Analytics, Help, About.
  • State — React Context for everything. Eleven contexts cover Recording, Meetings, Chat, Translation (i18n), Licence, Connectivity, Onboarding, Import, Theme, Toast, and Sidebar.
  • Tauri bridge — every Rust command has a typed wrapper in src/lib/tauri.ts. Components never call invoke() directly.
  • Internationalisation — runtime JSON loads from public/locales/{locale}.json. 30+ European languages.
  • UI library — Tailwind CSS + shadcn/ui primitives + a custom CubeLoader.

The Rust backend

Each feature is a self-contained module under src-tauri/src/:

  • audio/ — capture, VAD, mixing, recording-to-WAV.
  • transcription/ — Whisper engine, model lifecycle.
  • summary/ — LLM orchestration, six providers (local + five cloud), templates, action items.
  • chat/ — Study Sage commands; Pro-gated commands for flashcards and exam analysis.
  • database/ — SQLite repositories, migrations, FTS5 search, analytics.
  • security/ — input validation, model header checks.
  • export.rs — Markdown → Typst → PDF (Pro).
  • connectivity.rs — 60-second network polling, offline fallback signal.
  • license.rs — Lemon Squeezy + OS keychain + 30-day grace.
  • diagnostics.rs — log export, telemetry opt-in, bug reports.
  • permissions.rs — macOS permission requests.

The audio pipeline

Capture happens at 48 kHz. Audio destined for Whisper gets resampled to 16 kHz mono before inference. The pipeline runs two paths in parallel:

  • Recording path — RMS-based ducking (mic ducks system audio), soft limiting / anti-clipping, then to a WAV file on disk.
  • Transcription path — RMS-energy Voice Activity Detection strips silence (saves 10–60% of inference time depending on how chatty the recording is), then to the Whisper engine.

File import (MP3, M4A / AAC, FLAC, OGG, WAV) uses Symphonia and joins the same pipeline at the transcription stage — there's no separate import code path.

The transcription engine

Whisper runs via whisper-rs, with GPU acceleration selected at build time: Metal on macOS, CUDA on NVIDIA, Vulkan on AMD / Intel. If no GPU backend is available the engine falls back to CPU and still works — just slower.

Models range from ~39 MB (tiny) to ~2.9 GB (large-v3); the catalog also includes the modern large-v3-turbo and several quantised q5_0 variants. Whisper runs in a single pass per file. The model handles its own internal ~30-second windowing, and segment results stream back to the frontend as they're produced.

The summary engine

Six logical providers sit behind a common SummaryProvider trait:

  • builtin-aillama-cpp-2 running GGUF models on-device.
  • ollama — talks to a local Ollama HTTP server.
  • openai, anthropic, mistral, google-gemini — cloud providers, used only when you supply an API key (or for managed Pro access).

The default for new installs is builtin-ai — fully on- device, no internet required. Cloud providers are explicit opt-in per session.

What stays local, what moves

  • Always local: recordings (WAV files), transcripts, summaries, settings, the SQLite database, your meetings index.
  • Encrypted at rest: API keys for cloud providers (OS keychain on macOS, DPAPI on Windows, libsecret on Linux).
  • Only when you opt in: the content of a single summarisation request goes to the cloud provider you chose, directly from your machine. We don't proxy or intercept.
  • Optional, off by default: anonymous error reports (stack trace + app version + OS version, never any user content).