Architecture

This document describes the current implementation as of v0.6.2 and highlights tradeoffs, assumptions, and known constraints.

1) Entry Points

Primary runtime entry points:

Sidepanel app: sidepanel.tsx -> src/sidepanel/app.tsx
Options app: options.tsx -> src/options/app.tsx
Background service worker: src/background/index.ts
Content scripts: src/contents/index.ts, src/contents/selection-button.tsx

These map to extension pages generated by WXT.

2) System Responsibilities

Sidepanel

Chat interaction UX
Session display and branch navigation
Streaming state updates
Local chat actions (edit, fork, delete, export)

Options

Provider configuration
Model parameters
Embedding/RAG configuration
Feature toggles and diagnostics

Background worker

Provider resolution and streaming orchestration
Model management handlers
Embedding generation handlers for file chunks
Browser-level APIs (DNR/CORS rules, context menu)

Content scripts

Selected-text capture
Page extraction entrypoints for browser context workflows

3) Data Flow (UI -> Background -> Provider -> Stream -> Storage)

User sends prompt in sidepanel.
UI opens a runtime port (MESSAGE_KEYS.PROVIDER.STREAM_RESPONSE) to background.
Background receives CHAT_WITH_MODEL and resolves provider using model mapping.
Provider starts streaming tokens back to background.
Background relays chunks to UI through port messages.
UI applies optimistic updates and persists completed messages in local chat store.
Optional embedding pipelines index chat/file content for retrieval.

flowchart LR
    A["Sidepanel UI (React)"] --> B["Runtime Port (STREAM_RESPONSE)"]
    B --> C["Background Worker"]
    C --> D["ProviderFactory resolve by model mapping"]
    D --> E["Ollama Provider"]
    D --> F["LM Studio Provider"]
    D --> G["llama.cpp Provider"]
    E --> H["Chunk Stream"]
    F --> H
    G --> H
    H --> I["UI Stream State Update"]
    I --> J["Dexie Chat Store (IndexedDB)"]
    I --> K["Optional RAG Pipeline"]
    K --> L["Embedding Strategy Chain"]
    L --> M["Vector DB (Dexie IndexedDB)"]

4) Model Selection and Provider Routing

Model selection

Selected model key is persisted under provider key path (STORAGE_KEYS.PROVIDER.SELECTED_MODEL) with legacy reads.
Model list is built by querying all enabled providers in useProviderModels.

Provider integration

Provider configs are persisted via ProviderManager (ProviderStorageKey.CONFIG).
Default profiles: Ollama, LM Studio, llama.cpp.
Per-model provider routing is stored via ProviderStorageKey.MODEL_MAPPINGS.
Background routing is performed by ProviderFactory.getProviderForModel(modelId).

5) Streaming Architecture

Streaming occurs over extension runtime ports:

UI hook: src/features/chat/hooks/use-chat-stream.ts
Background handler: src/background/handlers/handle-chat-with-model.ts
Abort/cancel handling: abort-controller-registry

Why this design:

Runtime ports support continuous chunk delivery better than one-shot messages.
Cancel support is clean via AbortController scoped to active stream keys.

Tradeoff:

Message keys are provider-named (PROVIDER.*) with legacy OLLAMA.* compatibility.

6) Storage Architecture

Active runtime storage

Local SQL WASM storage (sql.js) for chat/session/embeddings data.
Settings/provider config: @plasmohq/storage (via plasmoGlobalStorage wrapper)
Export/restore uses ZIP bundles with versioned manifests.

7) RAG/Embedding Architecture (Current)

Embeddings are generated via a browser-safe strategy chain.
Content is chunked and stored in the local SQL WASM store.
Query-time retrieval uses hybrid search with adaptive weighting.
Pipeline includes diversity filtering and recency/feedback score hooks.
Browser-first module contracts for next-step refactor are documented in src/lib/rag/core/interfaces.ts.
Embeddings use a fallback chain: provider-native -> shared model -> background warmup -> Ollama fallback.
Background model preparation currently performs pull operations only through Ollama handlers.

Important constraint:

There is no OCR pipeline and no WASM-based reranker in v0.6.2.

8) Why Background Worker Is Used

Keeps provider network I/O and long-running operations off UI thread.
Centralizes extension APIs that are unavailable or unsafe in UI contexts.
Simplifies cancellation and stream lifecycle tracking.

9) Tradeoffs and Architectural Decisions

Legacy naming retained for compatibility
- Pro: avoids migration breakage.
- Con: causes confusion in multi-provider code paths.
Dexie runtime + SQLite migration path
- Pro: stable current UX with incremental migration work.
- Con: two persistence strategies increase maintenance overhead.
Provider-agnostic chat with provider-specific management features
- Pro: fast rollout of multi-provider chat.
- Con: uneven feature parity (pull/delete/version are Ollama-centric).
Local retrieval pipeline over extension constraints
- Pro: privacy-preserving retrieval.
- Con: CSP/performance limits prevent full in-browser model/reranker parity.

10) Assumptions and Constraints

Assumptions:

User can run at least one provider endpoint.
Endpoint URLs are reachable from extension context.
Local resources are sufficient for selected models.

Constraints:

Chrome extension CSP limits some WASM/worker ML paths.
Firefox lacks Chrome DNR API behavior.
Provider model naming collisions can cause ambiguous mapping behavior.

11) Known Risks / Technical Debt

Legacy ollama-* keys retained for compatibility while provider naming becomes default
Partial provider parity in model-management actions
Dual persistence architecture during migration period
Retrieval quality depends on chunking/threshold tuning and model quality

12) Desktop Design Notes (Non-Implementation)

Provider abstraction (factory/manager/types) is intentionally runtime-agnostic and can be reused in a desktop app.
Provider identity metadata (icons/display names) should remain shared via src/lib/providers/registry.ts.
Browser-only APIs (DNR, extension messaging) are already isolated in background handlers and would map to Electron main-process equivalents.
Storage keys are provider-agnostic with legacy shims; a desktop app can reuse the same keys to migrate settings.

13) Near-Term Architecture Priorities

Normalize provider-agnostic naming.
Decide single source of truth for chat persistence.
Expand provider parity for management actions.
Improve retrieval observability and failure diagnostics.