Architecture

This document describes the current implementation as of v0.6.2 and highlights tradeoffs, assumptions, and known constraints.

1) Entry Points

Primary runtime entry points:

  • Sidepanel app: sidepanel.tsx -> src/sidepanel/app.tsx
  • Options app: options.tsx -> src/options/app.tsx
  • Background service worker: src/background/index.ts
  • Content scripts: src/contents/index.ts, src/contents/selection-button.tsx

These map to extension pages generated by WXT.

2) System Responsibilities

Sidepanel

  • Chat interaction UX
  • Session display and branch navigation
  • Streaming state updates
  • Local chat actions (edit, fork, delete, export)

Options

  • Provider configuration
  • Model parameters
  • Embedding/RAG configuration
  • Feature toggles and diagnostics

Background worker

  • Provider resolution and streaming orchestration
  • Model management handlers
  • Embedding generation handlers for file chunks
  • Browser-level APIs (DNR/CORS rules, context menu)

Content scripts

  • Selected-text capture
  • Page extraction entrypoints for browser context workflows

3) Data Flow (UI -> Background -> Provider -> Stream -> Storage)

  1. User sends prompt in sidepanel.
  2. UI opens a runtime port (MESSAGE_KEYS.PROVIDER.STREAM_RESPONSE) to background.
  3. Background receives CHAT_WITH_MODEL and resolves provider using model mapping.
  4. Provider starts streaming tokens back to background.
  5. Background relays chunks to UI through port messages.
  6. UI applies optimistic updates and persists completed messages in local chat store.
  7. Optional embedding pipelines index chat/file content for retrieval.
flowchart LR
    A["Sidepanel UI (React)"] --> B["Runtime Port (STREAM_RESPONSE)"]
    B --> C["Background Worker"]
    C --> D["ProviderFactory resolve by model mapping"]
    D --> E["Ollama Provider"]
    D --> F["LM Studio Provider"]
    D --> G["llama.cpp Provider"]
    E --> H["Chunk Stream"]
    F --> H
    G --> H
    H --> I["UI Stream State Update"]
    I --> J["Dexie Chat Store (IndexedDB)"]
    I --> K["Optional RAG Pipeline"]
    K --> L["Embedding Strategy Chain"]
    L --> M["Vector DB (Dexie IndexedDB)"]

4) Model Selection and Provider Routing

Model selection

  • Selected model key is persisted under provider key path (STORAGE_KEYS.PROVIDER.SELECTED_MODEL) with legacy reads.
  • Model list is built by querying all enabled providers in useProviderModels.

Provider integration

  • Provider configs are persisted via ProviderManager (ProviderStorageKey.CONFIG).
  • Default profiles: Ollama, LM Studio, llama.cpp.
  • Per-model provider routing is stored via ProviderStorageKey.MODEL_MAPPINGS.
  • Background routing is performed by ProviderFactory.getProviderForModel(modelId).

5) Streaming Architecture

Streaming occurs over extension runtime ports:

  • UI hook: src/features/chat/hooks/use-chat-stream.ts
  • Background handler: src/background/handlers/handle-chat-with-model.ts
  • Abort/cancel handling: abort-controller-registry

Why this design:

  • Runtime ports support continuous chunk delivery better than one-shot messages.
  • Cancel support is clean via AbortController scoped to active stream keys.

Tradeoff:

  • Message keys are provider-named (PROVIDER.*) with legacy OLLAMA.* compatibility.

6) Storage Architecture

Active runtime storage

  • Local SQL WASM storage (sql.js) for chat/session/embeddings data.
  • Settings/provider config: @plasmohq/storage (via plasmoGlobalStorage wrapper)
  • Export/restore uses ZIP bundles with versioned manifests.

7) RAG/Embedding Architecture (Current)

  • Embeddings are generated via a browser-safe strategy chain.
  • Content is chunked and stored in the local SQL WASM store.
  • Query-time retrieval uses hybrid search with adaptive weighting.
  • Pipeline includes diversity filtering and recency/feedback score hooks.
  • Browser-first module contracts for next-step refactor are documented in src/lib/rag/core/interfaces.ts.
  • Embeddings use a fallback chain: provider-native -> shared model -> background warmup -> Ollama fallback.
  • Background model preparation currently performs pull operations only through Ollama handlers.

Important constraint:

  • There is no OCR pipeline and no WASM-based reranker in v0.6.2.

8) Why Background Worker Is Used

  • Keeps provider network I/O and long-running operations off UI thread.
  • Centralizes extension APIs that are unavailable or unsafe in UI contexts.
  • Simplifies cancellation and stream lifecycle tracking.

9) Tradeoffs and Architectural Decisions

  1. Legacy naming retained for compatibility

    • Pro: avoids migration breakage.
    • Con: causes confusion in multi-provider code paths.
  2. Dexie runtime + SQLite migration path

    • Pro: stable current UX with incremental migration work.
    • Con: two persistence strategies increase maintenance overhead.
  3. Provider-agnostic chat with provider-specific management features

    • Pro: fast rollout of multi-provider chat.
    • Con: uneven feature parity (pull/delete/version are Ollama-centric).
  4. Local retrieval pipeline over extension constraints

    • Pro: privacy-preserving retrieval.
    • Con: CSP/performance limits prevent full in-browser model/reranker parity.

10) Assumptions and Constraints

Assumptions:

  • User can run at least one provider endpoint.
  • Endpoint URLs are reachable from extension context.
  • Local resources are sufficient for selected models.

Constraints:

  • Chrome extension CSP limits some WASM/worker ML paths.
  • Firefox lacks Chrome DNR API behavior.
  • Provider model naming collisions can cause ambiguous mapping behavior.

11) Known Risks / Technical Debt

  • Legacy ollama-* keys retained for compatibility while provider naming becomes default
  • Partial provider parity in model-management actions
  • Dual persistence architecture during migration period
  • Retrieval quality depends on chunking/threshold tuning and model quality

12) Desktop Design Notes (Non-Implementation)

  • Provider abstraction (factory/manager/types) is intentionally runtime-agnostic and can be reused in a desktop app.
  • Provider identity metadata (icons/display names) should remain shared via src/lib/providers/registry.ts.
  • Browser-only APIs (DNR, extension messaging) are already isolated in background handlers and would map to Electron main-process equivalents.
  • Storage keys are provider-agnostic with legacy shims; a desktop app can reuse the same keys to migrate settings.

13) Near-Term Architecture Priorities

  1. Normalize provider-agnostic naming.
  2. Decide single source of truth for chat persistence.
  3. Expand provider parity for management actions.
  4. Improve retrieval observability and failure diagnostics.