Skip to content

Provider Setup

Recommended setup is Ollama as the primary local provider, with LM Studio and llama.cpp as local alternatives. vLLM, LocalAI, and KoboldCPP also work once configured.

Install Ollama Client from the Chrome Web Store.

ProviderDefault endpointNotes
Ollamahttp://localhost:11434Recommended baseline. Fullest model-management support.
LM Studiohttp://localhost:1234/v1OpenAI-compatible chat and embeddings with LM Studio model discovery.
llama.cpp serverhttp://localhost:8000/v1OpenAI-compatible. Run with llama-server.
vLLM / LocalAI / KoboldCPPUser configuredOpenAI-compatible servers; use your actual URL.

Install Ollama from ollama.com, then start it:

Terminal window
ollama serve

Pull at least one chat model:

Terminal window
ollama pull qwen2.5:3b

For tool calling and image input, choose a model that actually supports those capabilities. The extension detects reported capabilities where providers expose them, and lets you override them from the model menu when a provider cannot report them.

Pull one embeddings model for RAG:

Terminal window
ollama pull all-minilm:latest

You need at least one chat model and one embeddings model installed for the full experience.

  1. Open the extension’s options page.
  2. Go to the Providers tab.
  3. Enable the providers you want.
  4. Set the base URL and run a connection test.
  5. Pick a model from the chat model menu.
Terminal window
# Ollama
curl http://localhost:11434/api/tags
# LM Studio
curl http://localhost:1234/v1/models
# llama.cpp
curl http://localhost:8000/v1/models
  • Chat generation is fully provider-agnostic.
  • Image input is model-dependent. If the selected model is not vision-capable, the composer blocks image attach instead of sending unsupported input.
  • Tool calling is model-dependent. Tool-capable models can inspect browser context through local extension tools; non-tool models keep the old plain chat path.
  • Model-management actions depend on provider capabilities. Ollama has the fullest support; LM Studio adds pull/unload support.
  • Embedding generation uses the configured provider when supported, then falls back through the shared embedding path and Ollama for reliability.

Chrome-based browsers route extension requests through Declarative Net Request (DNR). Firefox uses a different extension API model.

  • Confirm the provider process is actually running.
  • Confirm the endpoint URL matches the runtime URL exactly (port, scheme, /v1 suffix).
  • Use the Test connection button in Providers settings before debugging model behavior.
  • Check the background console (chrome://extensions → service worker) for streaming or provider errors.