Backends and Configuration

Prerequisites: pip install mellea, Ollama for local inference or appropriate credentials for cloud backends.

A backend is the engine that runs the LLM. Mellea ships with backends for Ollama, OpenAI-compatible APIs, LiteLLM, HuggingFace transformers, and IBM WatsonX. You configure the backend when you create a session.

Default backend

start_session() defaults to Ollama with IBM Granite 4 Micro (granite4.1:3b). No API keys needed — just have Ollama running:

# Requires: mellea
# Returns: MelleaSession
import mellea

m = mellea.start_session()

Backend comparison

The following table shows all available backends, their class names, import paths, required extras, and start_session() configuration:

Backend	Class	Import	Required extras	`start_session()`
Ollama	`OllamaModelBackend`	`mellea.backends.ollama`	base	`backend_name="ollama"`
OpenAI	`OpenAIBackend`	`mellea.backends.openai`	base	`backend_name="openai"`
LiteLLM	`LiteLLMBackend`	`mellea.backends.litellm`	`mellea[litellm]`	`backend_name="litellm"`
HuggingFace	`LocalHFBackend`	`mellea.backends.huggingface`	`mellea[hf]`	`backend_name="hf"`
WatsonX	`WatsonxAIBackend`	`mellea.backends.watsonx`	`mellea[watsonx]`	`backend_name="watsonx"` (deprecated)

Note: Vertex AI uses the LiteLLM backend with appropriate model IDs. See the Vertex AI integration for details. For detailed setup instructions, click the backend name in the table above.

Switching the model

Pass any model string your backend supports:

# Requires: mellea
# Returns: MelleaSession
import mellea

m = mellea.start_session(model_id="llama3.2:3b")

Use model_ids constants for known models:

# Requires: mellea
# Returns: MelleaSession
from mellea import start_session
from mellea.backends import model_ids

m = start_session(model_id=model_ids.IBM_GRANITE_3_3_8B)

OpenAI backend

Backend note: This section requires pip install mellea (no extras needed — the OpenAI client is included). Needs a valid api_key for the OpenAI API; local endpoints such as LM Studio and Ollama's OpenAI endpoint do not require a real key.

Use any OpenAI-compatible API — OpenAI itself, LM Studio, vLLM, or Ollama's OpenAI-compatible endpoint:

# Requires: mellea
# Returns: MelleaSession
from mellea import MelleaSession
from mellea.backends.openai import OpenAIBackend
from mellea.stdlib.context import ChatContext

# OpenAI API
m = MelleaSession(
    OpenAIBackend(model_id="gpt-4o", api_key="sk-..."),
    ctx=ChatContext(),
)

# Requires: mellea
# Returns: MelleaSession
from mellea import MelleaSession
from mellea.backends.openai import OpenAIBackend

# LM Studio (local, no real key needed)
m = MelleaSession(
    OpenAIBackend(model_id="qwen2.5vl:7b", base_url="http://127.0.0.1:1234/v1"),
)

# Ollama via OpenAI-compatible endpoint
m = MelleaSession(
    OpenAIBackend(
        model_id="qwen2.5vl:7b",
        base_url="http://localhost:11434/v1",
        api_key="ollama",
    ),
)

LiteLLM backend

Backend note: Requires pip install "mellea[litellm]". Provider-specific environment variables must be set (e.g., AWS_BEARER_TOKEN_BEDROCK for Bedrock). See the LiteLLM docs for your provider's setup.

LiteLLM provides unified access to 100+ providers — Anthropic, AWS Bedrock, Azure, and more:

# Requires: mellea[litellm]
# Returns: str
import mellea

m = mellea.start_session(
    backend_name="litellm",
    model_id="bedrock/converse/us.amazon.nova-pro-v1:0",
)
result = m.chat("Give me three facts about the Amazon rainforest.")
print(str(result))
# Output will vary — LLM responses depend on model and temperature.

HuggingFace backend

Backend note: Requires pip install "mellea[hf]". Models are downloaded from HuggingFace Hub on first use. GPU recommended for reasonable inference speed. Required for Intrinsics.

Run models locally using HuggingFace transformers:

# Requires: mellea[hf]
# Returns: MelleaSession
from mellea import MelleaSession
from mellea.backends.huggingface import LocalHFBackend

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
m = MelleaSession(backend=backend)

WatsonX backend

Deprecated: The native WatsonX backend is deprecated. Use the LiteLLM or OpenAI backend with a WatsonX-compatible endpoint instead. See IBM WatsonX integration for the recommended setup.

Model options

ModelOption provides backend-agnostic keys for common generation parameters. Options set at session level apply to all calls; options passed to instruct() or chat() apply to that call only and take precedence:

# Requires: mellea
# Returns: str
from mellea import MelleaSession
from mellea.backends import ModelOption
from mellea.backends.ollama import OllamaModelBackend

# Set seed for all calls in this session
m = MelleaSession(
    backend=OllamaModelBackend(model_options={ModelOption.SEED: 42})
)

# Override temperature and token limit for a single call
answer = m.instruct(
    "What is 2 × 2?",
    model_options={
        ModelOption.TEMPERATURE: 0.5,
        ModelOption.MAX_NEW_TOKENS: 15,
    },
)
print(str(answer))
# Output will vary — LLM responses depend on model and temperature.

Available ModelOption constants:

Constant	Description
`ModelOption.TEMPERATURE`	Sampling temperature
`ModelOption.MAX_NEW_TOKENS`	Maximum tokens to generate
`ModelOption.SEED`	Random seed for reproducibility
`ModelOption.SYSTEM_PROMPT`	System prompt override
`ModelOption.THINKING`	Enable thinking / reasoning mode
`ModelOption.STREAM`	Enable streaming output
`ModelOption.TOOLS`	List of tools available to the model
`ModelOption.CONTEXT_WINDOW`	Context window size

You can also pass raw backend-native keys alongside ModelOption constants. If the same parameter is specified both ways, ModelOption takes precedence.

System prompt

ModelOption.SYSTEM_PROMPT is the recommended way to set a system message. It is translated correctly for all backends regardless of how each provider serializes the system role:

# Requires: mellea
# Returns: str
from mellea import start_session
from mellea.backends import ModelOption

m = start_session(model_options={ModelOption.SYSTEM_PROMPT: "You are a concise assistant."})
reply = m.chat("What is the capital of France?")
print(str(reply))
# Output will vary — LLM responses depend on model and temperature.

Direct backend construction

For full control, construct the backend and pass it to MelleaSession directly:

# Requires: mellea
# Returns: MelleaSession
import mellea
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.context import ChatContext

backend = OllamaModelBackend(model_id="phi4-mini:latest")
m = mellea.MelleaSession(backend=backend, ctx=ChatContext())

start_session() accepts the same arguments as keyword parameters:

# Requires: mellea
# Returns: MelleaSession
import mellea
from mellea.backends import ModelOption
from mellea.stdlib.context import ChatContext

m = mellea.start_session(
    backend_name="ollama",
    model_id="phi4-mini:latest",
    ctx=ChatContext(),
    model_options={ModelOption.TEMPERATURE: 0.1},
)

Valid backend_name values: "ollama", "openai", "hf", "litellm", "watsonx".

See also: Configure Model Options | Integrations

Default backend​

Backend comparison​

Switching the model​

OpenAI backend​

LiteLLM backend​

HuggingFace backend​

WatsonX backend​

Model options​

System prompt​

Direct backend construction​