Architecture
This document describes the current implementation as of v0.6.2 and
highlights tradeoffs, assumptions, and known constraints.
1) Entry Points
Primary runtime entry points:
- Sidepanel app:
sidepanel.tsx->src/sidepanel/app.tsx - Options app:
options.tsx->src/options/app.tsx - Background service worker:
src/background/index.ts - Content scripts:
src/contents/index.ts,src/contents/selection-button.tsx
These map to extension pages generated by WXT.
2) System Responsibilities
Sidepanel
- Chat interaction UX
- Session display and branch navigation
- Streaming state updates
- Local chat actions (edit, fork, delete, export)
Options
- Provider configuration
- Model parameters
- Embedding/RAG configuration
- Feature toggles and diagnostics
Background worker
- Provider resolution and streaming orchestration
- Model management handlers
- Embedding generation handlers for file chunks
- Browser-level APIs (DNR/CORS rules, context menu)
Content scripts
- Selected-text capture
- Page extraction entrypoints for browser context workflows
3) Data Flow (UI -> Background -> Provider -> Stream -> Storage)
- User sends prompt in sidepanel.
- UI opens a runtime port (
MESSAGE_KEYS.PROVIDER.STREAM_RESPONSE) to background. - Background receives
CHAT_WITH_MODELand resolves provider using model mapping. - Provider starts streaming tokens back to background.
- Background relays chunks to UI through port messages.
- UI applies optimistic updates and persists completed messages in local chat store.
- Optional embedding pipelines index chat/file content for retrieval.
flowchart LR
A["Sidepanel UI (React)"] --> B["Runtime Port (STREAM_RESPONSE)"]
B --> C["Background Worker"]
C --> D["ProviderFactory resolve by model mapping"]
D --> E["Ollama Provider"]
D --> F["LM Studio Provider"]
D --> G["llama.cpp Provider"]
E --> H["Chunk Stream"]
F --> H
G --> H
H --> I["UI Stream State Update"]
I --> J["Dexie Chat Store (IndexedDB)"]
I --> K["Optional RAG Pipeline"]
K --> L["Embedding Strategy Chain"]
L --> M["Vector DB (Dexie IndexedDB)"]
4) Model Selection and Provider Routing
Model selection
- Selected model key is persisted under provider key path (
STORAGE_KEYS.PROVIDER.SELECTED_MODEL) with legacy reads. - Model list is built by querying all enabled providers in
useProviderModels.
Provider integration
- Provider configs are persisted via
ProviderManager(ProviderStorageKey.CONFIG). - Default profiles: Ollama, LM Studio, llama.cpp.
- Per-model provider routing is stored via
ProviderStorageKey.MODEL_MAPPINGS. - Background routing is performed by
ProviderFactory.getProviderForModel(modelId).
5) Streaming Architecture
Streaming occurs over extension runtime ports:
- UI hook:
src/features/chat/hooks/use-chat-stream.ts - Background handler:
src/background/handlers/handle-chat-with-model.ts - Abort/cancel handling:
abort-controller-registry
Why this design:
- Runtime ports support continuous chunk delivery better than one-shot messages.
- Cancel support is clean via
AbortControllerscoped to active stream keys.
Tradeoff:
- Message keys are provider-named (
PROVIDER.*) with legacyOLLAMA.*compatibility.
6) Storage Architecture
Active runtime storage
- Local SQL WASM storage (
sql.js) for chat/session/embeddings data. - Settings/provider config:
@plasmohq/storage(viaplasmoGlobalStoragewrapper) - Export/restore uses ZIP bundles with versioned manifests.
7) RAG/Embedding Architecture (Current)
- Embeddings are generated via a browser-safe strategy chain.
- Content is chunked and stored in the local SQL WASM store.
- Query-time retrieval uses hybrid search with adaptive weighting.
- Pipeline includes diversity filtering and recency/feedback score hooks.
- Browser-first module contracts for next-step refactor are documented in
src/lib/rag/core/interfaces.ts. - Embeddings use a fallback chain: provider-native -> shared model -> background warmup -> Ollama fallback.
- Background model preparation currently performs pull operations only through Ollama handlers.
Important constraint:
- There is no OCR pipeline and no WASM-based reranker in v0.6.2.
8) Why Background Worker Is Used
- Keeps provider network I/O and long-running operations off UI thread.
- Centralizes extension APIs that are unavailable or unsafe in UI contexts.
- Simplifies cancellation and stream lifecycle tracking.
9) Tradeoffs and Architectural Decisions
-
Legacy naming retained for compatibility
- Pro: avoids migration breakage.
- Con: causes confusion in multi-provider code paths.
-
Dexie runtime + SQLite migration path
- Pro: stable current UX with incremental migration work.
- Con: two persistence strategies increase maintenance overhead.
-
Provider-agnostic chat with provider-specific management features
- Pro: fast rollout of multi-provider chat.
- Con: uneven feature parity (pull/delete/version are Ollama-centric).
-
Local retrieval pipeline over extension constraints
- Pro: privacy-preserving retrieval.
- Con: CSP/performance limits prevent full in-browser model/reranker parity.
10) Assumptions and Constraints
Assumptions:
- User can run at least one provider endpoint.
- Endpoint URLs are reachable from extension context.
- Local resources are sufficient for selected models.
Constraints:
- Chrome extension CSP limits some WASM/worker ML paths.
- Firefox lacks Chrome DNR API behavior.
- Provider model naming collisions can cause ambiguous mapping behavior.
11) Known Risks / Technical Debt
- Legacy
ollama-*keys retained for compatibility while provider naming becomes default - Partial provider parity in model-management actions
- Dual persistence architecture during migration period
- Retrieval quality depends on chunking/threshold tuning and model quality
12) Desktop Design Notes (Non-Implementation)
- Provider abstraction (factory/manager/types) is intentionally runtime-agnostic and can be reused in a desktop app.
- Provider identity metadata (icons/display names) should remain shared via
src/lib/providers/registry.ts. - Browser-only APIs (DNR, extension messaging) are already isolated in background handlers and would map to Electron main-process equivalents.
- Storage keys are provider-agnostic with legacy shims; a desktop app can reuse the same keys to migrate settings.
13) Near-Term Architecture Priorities
- Normalize provider-agnostic naming.
- Decide single source of truth for chat persistence.
- Expand provider parity for management actions.
- Improve retrieval observability and failure diagnostics.