Glossary

Mellea-specific terms used throughout this guide. Terms are listed alphabetically. Cross-links from guide pages point here on first use only.

act() / aact()

act() is the generic session method that runs any Component and returns a result. Every higher-level method (instruct(), chat(), query(), transform()) builds a Component and delegates to act(). Use act() directly when working with custom components or building your own inference loops.

aact() is the async counterpart — same signature, same return types.

See: act() and aact()

aLoRA (Activated LoRA)

An Activated LoRA (aLoRA) is a LoRA adapter dynamically loaded by LocalHFBackend at inference time to serve as a lightweight requirement verifier. Instead of running a full LLM call to check a requirement, the adapter is activated on the same model weights already in memory. Granite Switch models embed these adapters directly in the model weights, enabling intrinsic functions via OpenAIBackend without runtime adapter loading.

See: LoRA and aLoRA Adapters

@generative

A decorator that converts a typed Python function into an AI-powered function. @generative uses the function's name, docstring, parameters, and return type annotation to instruct the LLM. The output is constrained to match the return type. Write the function in idiomatic Python — the more natural the signature and docstring, the better the model understands and imitates it.

from mellea import generative, start_session

@generative
def classify_language(code: str) -> str:
    """Return the programming language of the code snippet."""
    ...

m = start_session()
lang = classify_language(m, code="print('hello')")

See: Generative Functions

Backend

A backend is an inference engine that Mellea uses to run LLM calls. Examples: OllamaModelBackend, OpenAIBackend, LocalHFBackend, WatsonxAIBackend. Backends are configured via MelleaSession or start_session().

See: Backends and Configuration

ChatContext

The standard multi-turn context implementation. ChatContext accumulates the full conversation history and passes it to the backend on each call. Create one at the start of a session and pass it through all calls to maintain state:

from mellea.stdlib.context import ChatContext
ctx = ChatContext()

Use window_size to cap how many turns are sent to the backend:

ctx = ChatContext(window_size=10)

Use SimpleContext instead for stateless, single-turn calls.

See: Context and Sessions

CBlock

A CBlock (content block) is the low-level unit of content in Mellea. A CBlock holds text (or image data) and is assembled by a Component into the prompt sent to the backend. Multiple CBlocks compose into a single LLM request.

See: Mellea Core Internals

Component

A Component is a reusable, composable unit in Mellea that encapsulates a prompt structure, its requirements, and its parsing logic. Instruction, Message, MObject, and Document are all Component subclasses. Components are the building blocks of generative programs.

See: Building Custom Components

ComponentParseError

The exception raised by Component.parse() when the model's output cannot be parsed into the component's declared return type S. parse() catches any exception from _parse() and re-raises it as ComponentParseError so all callers get a consistent error type regardless of the underlying parse implementation.

from mellea.core import ComponentParseError

try:
    result = form.parse(thunk)
except ComponentParseError as e:
    print(f"Parsing failed: {e}")

See: Building Custom Components

ContextTurn

A single turn of model input and model output stored inside a Context. Each call to m.instruct(), m.chat(), or m.act() appends a ContextTurn to the active context. Turns are consumed by the backend formatter to build the conversation history sent to the model.

Context

A Context holds the conversation history threaded through a MelleaSession. Mellea provides SimpleContext (single-turn) and ChatContext (multi-turn). Push and pop operations let you branch and restore context state across calls.

See: Context and Sessions

Document

A Component that wraps a plain-text reference document for inclusion in a prompt. Pass one or more Document objects in the _docs field of a Message or directly as grounding context in an Instruction. Unlike RichDocument, Document holds pre-extracted text rather than a parsed file.

from mellea.stdlib.components.docs.document import Document
doc = Document(text="...", title="My doc", doc_id="ref-1")

Generative function

A Python function decorated with @generative. Mellea uses the function's type annotation as the output schema and its docstring as the prompt. Generative functions are called with a MelleaSession as the first argument and return the annotated type.

See: Generative Functions

Generative program

Any computer program that contains calls to an LLM. Mellea is a library for writing robust, composable generative programs.

See: Generative Programming

GenerateLog

A dataclass that captures a single model call in detail. Pass a list[GenerateLog] to m.validate() via the generate_logs= parameter to record the judge prompt and raw verdict for each requirement validation:

from mellea import start_session
from mellea.core import GenerateLog
from mellea.stdlib.requirements import req

logs: list[GenerateLog] = []
m = start_session()
result = m.instruct("Summarise this text.")
m.validate([req("Must be under 30 words.")], generate_logs=logs)

for log in logs:
    print(log.prompt)   # full judge prompt sent to the model
    print(log.result.value if log.result else None)  # raw verdict string

Key fields: prompt, result (ModelOutputThunk | None), backend, model_options, is_final_result.

See: Evaluate with LLM-as-a-Judge

Hook / HookType

A hook is an async function that intercepts a specific event in Mellea's execution pipeline. The @hook decorator marks a function as a hook handler, binding it to a HookType — one of 17 named events spanning session lifecycle, component execution, generation, validation, sampling, and tool invocation.

Hooks receive a frozen payload and read-only context, and return None (pass through), a modify() result (alter the payload), or a block() result (reject the operation).

from mellea.plugins import HookType, hook

@hook(HookType.GENERATION_PRE_CALL)
async def my_hook(payload, ctx):
    ...

See: Plugins & Hooks

grounding_context

The grounding_context parameter of m.instruct() accepts a dictionary of named text entries that Mellea injects into the prompt as grounding evidence. Each entry is tracked as a separate context component, so it can be traced and rendered independently from the instruction template.

Use grounding_context to anchor the model's output to retrieved documents, knowledge-base passages, or any reference material — without mixing that content into user_variables:

answer = m.instruct(
    "Answer the question: {{question}}",
    user_variables={"question": query},
    grounding_context={"doc0": doc_text_0, "doc1": doc_text_1},
)

Without grounding_context, m.instruct() generates from the model's parametric knowledge only. It is the primary integration point for RAG pipelines.

See: Build a RAG Pipeline

CRITERIA_BANK

A dictionary mapping short string keys to full criteria descriptions used by guardian_check(). Pass a key directly as the criteria argument — the function looks up the full description automatically.

Available keys: "harm", "social_bias", "jailbreak", "profanity", "unethical_behavior", "violence", "groundedness", "answer_relevance", "context_relevance", "function_call".

from mellea.stdlib.components.intrinsic.guardian import CRITERIA_BANK

print(list(CRITERIA_BANK.keys()))

See: Safety Guardrails

SCORING_SCHEMA_BANK

A dictionary mapping short string keys to scoring-schema sentences used by guardian_check(). Pass a key as the scoring_schema argument to control which span Guardian evaluates.

Available keys: "assistant_response" (default), "user_prompt", "last_turn", "tool_call". Custom strings are also accepted but must resolve to a yes/no verdict.

from mellea.stdlib.components.intrinsic.guardian import SCORING_SCHEMA_BANK

print(list(SCORING_SCHEMA_BANK.keys()))

The deprecated target_role="user" | "assistant" argument is now superseded by scoring_schema="user_prompt" | "assistant_response".

See: Safety Guardrails

factuality_correction()

A Guardian Intrinsic function that generates a corrected version of the assistant's last response grounded in the documents provided in context. Returns whatever the model emits as a str — typically the corrected text. The model may emit "none" when no correction is needed, but this is a model-side convention, not part of the API contract; gate calls on a positive factuality_detection() result.

from mellea.stdlib.components.intrinsic.guardian import factuality_correction

See: Safety Guardrails

factuality_detection()

A Guardian Intrinsic function that evaluates whether the assistant's last response is factually consistent with the documents in context. Returns "yes" if the response contains factual errors, or "no" if it is consistent.

from mellea.stdlib.components.intrinsic.guardian import factuality_detection

See: Safety Guardrails

guardian_check()

A Guardian Intrinsic function that evaluates the last message from a given role in a ChatContext against a safety or quality criterion. Returns a float score from 0.0 (no risk) to 1.0 (risk detected); values at or above 0.5 indicate risk.

Accepts any key from CRITERIA_BANK or a custom free-text criteria string.

from mellea.stdlib.components.intrinsic import guardian

score = guardian.guardian_check(context, backend, criteria="harm")

See: Safety Guardrails

GuardianCheck

Deprecated as of v0.4. Use guardian_check(), policy_guardrails(), or factuality_detection() from the Guardian Intrinsics instead. See Safety Guardrails.

A deprecated Requirement subclass that validates LLM outputs using a separately loaded Granite Guardian model. Requires an independent Ollama or HuggingFace backend for the Guardian model.

See: Safety Guardrails

GuardianRisk

Deprecated as of v0.4. Use CRITERIA_BANK string keys with guardian_check() instead.

An enum specifying which safety risk category the deprecated GuardianCheck class should detect. Replaced by the string keys in CRITERIA_BANK.

See: Safety Guardrails

policy_guardrails()

A Guardian Intrinsic function that checks whether a scenario complies with a natural-language policy. Returns "Yes" (compliant), "No" (non-compliant), or "Ambiguous" (insufficient information to decide).

from mellea.stdlib.components.intrinsic.guardian import policy_guardrails

See: Safety Guardrails

Granite Switch

A Granite model variant with LoRA and aLoRA adapters pre-baked into the model weights. When served via vLLM and accessed through OpenAIBackend with load_embedded_adapters=True, these embedded adapters enable Intrinsics (RAG quality checks, requirement validation, safety evaluation) without runtime adapter loading. Only intrinsics embedded in the model are available — check the model's adapter_index.json.

See: Official Granite Switch Documentation | Intrinsics | OpenAI and OpenAI-Compatible APIs

KV smashing

The technique of concatenating key-value attention caches from separately prefilled prompt chunks along the time axis, producing a single merged DynamicCache that covers the full context. Used by LocalHFBackend to avoid re-running forward passes on content that has already been cached.

When a prompt contains a mix of cached and uncached CBlock objects, Mellea prefills each block independently, then smashes the resulting caches together before generation — giving results identical to a single full-context forward pass at a fraction of the prefill cost.

See: Prefix Caching and KV Blocks

LiteLLM / LiteLLMBackend

LiteLLMBackend wraps LiteLLM — a unified interface over 100+ model providers. Use it to reach providers not covered by Mellea's native backends: Bedrock via IAM, Vertex AI, Together AI, Cohere, and others.

pip install 'mellea[litellm]'

m = mellea.start_session(
    backend_name="litellm",
    model_id="bedrock/converse/us.amazon.nova-pro-v1:0",
)

See: Backends and Configuration

LLM-as-a-judge

The default validation strategy for req() in Mellea. After the model generates an output, a second LLM call is made using the requirement's description as the evaluation criterion. Mellea converts the judge's response to True / False by looking for "yes" (case-insensitive) in the reply.

Use simple_validate instead when the criterion is deterministic (word count, regex, type check) — no second LLM call is needed.

See: Evaluate with LLM-as-a-Judge

ImageBlock

A Mellea type that represents an image in a backend-agnostic, encoded form. Use ImageBlock.from_pil_image(pil_image) to convert a Pillow Image object into an ImageBlock. Both raw PIL images and ImageBlock objects are accepted in the images=[...] parameter of instruct() and chat().

Use ImageBlock when you need an already-encoded representation, or when the PIL image is not directly available (e.g., passing between functions or caching).

See: Use Images and Vision Models

Intrinsic

An Intrinsic is a backend-level primitive in Mellea — a structured generation operation with special handling (e.g., constrained decoding, RAG retrieval). LocalHFBackend exposes Intrinsics via runtime adapter loading. OpenAIBackend supports Intrinsics when backed by a Granite Switch model with load_embedded_adapters=True.

See: Intrinsics

Instruction

The core Component in the IVR loop. An Instruction wraps a prompt description, optional requirements, in-context examples, and grounding context into a single object that m.act() can execute. m.instruct() is a convenience wrapper that builds an Instruction for you.

from mellea.stdlib.components.instruction import Instruction
instr = Instruction(
    description="Summarise the following text: {{text}}",
    requirements=[req("Must be under 50 words.")],
    user_variables={"text": "..."},
)
result = m.act(instr)

IVR (Instruct-Validate-Repair)

A core generative programming pattern in Mellea:

Instruct — call the LLM with a prompt.
Validate — check the output against a Requirement.
Repair — if validation fails, retry or fix the output.

See: Instruct, Validate, Repair

MelleaSession

The primary entry point for Mellea. A MelleaSession wraps a backend and provides instruct(), chat(), act(), aact(), query(), and transform() as session-level methods. Use mellea.start_session() to create one with defaults.

import mellea
m = mellea.start_session()  # returns a MelleaSession

mify / @mify

The @mify decorator turns any Python class into an MObject — an LLM-queryable, tool-accessible wrapper around your data. You specify which fields and methods are visible to the LLM; everything else remains hidden.

See: MObjects and mify

MObject

An MObject is a Python class decorated with @mify. It wraps existing data objects so they can be queried and transformed by the LLM via m.query() and m.transform(). Unlike @generative, @mify does not change the class's Python interface — it adds a layer that the LLM can see and call.

See: MObjects and mify

ModelOption

An enum (mellea.backends.ModelOption) of backend-agnostic inference options: TEMPERATURE, SEED, MAX_NEW_TOKENS, SYSTEM_PROMPT, etc. Using ModelOption keys ensures the same options work across all backends.

from mellea.backends import ModelOption

See: Configure Model Options

ModelOutputThunk

The return type of m.instruct(), m.act(), and most session-level generative calls. It wraps the model's raw output and an optional parsed representation typed to your output schema (accessible via .result).

The value is computed lazily — the underlying inference call may not have completed when the thunk is returned. Accessing .value blocks until the result is ready. For async code, use await thunk.avalue() to await completion, or await thunk.astream() to consume output chunk by chunk as it arrives.

You can also call str(thunk) to get the raw string output directly.

Use thunk.is_computed() to check whether the value has already been filled without triggering evaluation.

Plugin

A Plugin is a class-based extension point in Mellea that groups multiple hooks sharing instance state. Inherit from Plugin and set name and priority as class keyword arguments. Decorate methods with @hook(HookType.XXX) to subscribe to pipeline events.

Use standalone @hook functions for single-concern hooks. Use Plugin subclasses when multiple hooks need shared state (e.g., a redaction counter or rate limiter).

from mellea.plugins import Plugin, hook, HookType

class MyPlugin(Plugin, name="my-plugin", priority=10):
    @hook(HookType.GENERATION_PRE_CALL)
    async def before_llm(self, payload, ctx):
        ...

See: Plugins & Hooks

PluginSet

A PluginSet groups related hooks and Plugin instances into a reusable, named bundle. Use it to organize plugins by concern (security, observability, compliance) and register or scope them as a unit. A PluginSet accepts any mix of standalone @hook functions, Plugin instances, or nested PluginSets.

from mellea.plugins import PluginSet

security = PluginSet("security", [hook_a, hook_b, plugin_instance])

See: Plugins & Hooks

PreconditionException

Raised when a requirement attached to a @generative function's input arguments fails — i.e., before the LLM call is made. Catch it to handle pre-call validation failures gracefully.

from mellea.stdlib.components.genstub import PreconditionException

try:
    result = my_generative_fn(m, ...)
except PreconditionException as e:
    print(e.validation)  # list of ValidationResult

See: Handling Exceptions and Failures

Purple elephant effect

The tendency for a model to produce the very thing you instructed it to avoid, because the instruction draws attention to it. Named after the cognitive phenomenon: "Don't think about a purple elephant" — and now you are.

In Mellea, avoid it by using check() instead of req() for negative constraints. check() validates the output without including the constraint description in the generation prompt:

from mellea.stdlib.requirements import req, check

requirements=[
    req("Mention key features."),                        # model is told this
    check("Must not use the phrase 'industry-leading'"), # model is not told this
]

See: Evaluate with LLM-as-a-Judge

ReAct

Reason + Act — a goal-driven agentic loop where the LLM alternates between reasoning about the next step and calling a tool, repeating until the goal is achieved. Mellea provides mellea.stdlib.frameworks.react.react() as a built-in async implementation:

from mellea.stdlib.frameworks.react import react
result, _ = await react(goal="...", context=ChatContext(), backend=m.backend, tools=[...])

See: Tools and Agents

Requirement

A Requirement is a validation constraint applied to a generative function's output. Requirements can be programmatic (lambda, regex, type check) or generative (another LLM call). Used in the IVR pattern.

req() and check() are the common shorthand constructors from mellea.stdlib.requirements:

req(description) — creates a Requirement whose description is included in the prompt, so the model knows to aim for it.
check(description) — creates a check-only Requirement whose description is not included in the prompt (avoids the "purple elephant effect" — mentioning a forbidden thing often makes the model produce it).
simple_validate(fn) — wraps a lambda or function into a validation_fn, bypassing LLM-as-a-judge for fast deterministic checks.
PythonExecutionReq — verifies that Python code in the LLM's output runs without raising an exception. Import from mellea.stdlib.requirements.python_reqs. Accepts timeout, allowed_imports, and use_sandbox (Docker-based isolation).

See: Requirements System

RichDocument

A RichDocument wraps a Docling parsed document to make PDFs, tables, and structured files queryable by the LLM. Extract tables as Table objects and pass them directly to m.transform() or m.query().

pip install 'mellea[docling]'

See: Working with Data

SimpleLRUCache

An LRU (least-recently-used) cache for storing DynamicCache KV blocks in LocalHFBackend. Pass one at construction time to enable prefix caching:

from mellea.backends.cache import SimpleLRUCache

backend = LocalHFBackend(
    model_id="ibm-granite/granite-3.3-2b-instruct",
    cache=SimpleLRUCache(capacity=5),
)

When the cache reaches capacity, the least recently used block is evicted and its GPU memory freed. Choose capacity based on available VRAM and block size — 1–3 for large documents, up to 10 for small reused fragments.

See: Prefix Caching and KV Blocks

SimpleContext

A stateless context where each call is independent — no conversation history is accumulated or sent to the backend. Use it for single-shot tasks where prior turns are irrelevant.

from mellea.stdlib.context import SimpleContext
ctx = SimpleContext()

For multi-turn conversations, use ChatContext instead.

See: Context and Sessions

Sampling strategy

A SamplingStrategy controls how the IVR loop behaves when a requirement fails. Mellea's built-in strategies:

Strategy	Behaviour
`RejectionSamplingStrategy`	Retry up to `loop_budget` times; return first passing result
`RepairTemplateStrategy`	Like rejection sampling but appends failure reasons to the original instruction
`MultiTurnStrategy`	Add validation failures as a new chat turn; model revises its previous attempt
`MajorityVotingStrategyForMath`	Generate N candidates; return the one supported by most (math expressions)
`MBRDRougeLStrategy`	Minimum Bayes Risk decoding using ROUGE-L; best for text generation tasks
`SOFAISamplingStrategy`	Fast System-1 generation verified by a slower System-2 model
`BudgetForcingSamplingStrategy`	Inject thinking tokens to expand reasoning budget
`BaseSamplingStrategy`	Abstract base; extend to implement custom repair and selection logic

See: Inference-Time Scaling

SamplingResult

The return type of session calls made with return_sampling_results=True, and of the serve() function used with m serve. Holds .result (the selected output), .success (whether a requirement was met), and .sample_generations (all candidates generated).

Table

An MObject wrapping a single table extracted from a RichDocument. Supports m.query() and m.transform() directly, plus .to_markdown() and .transpose().

tables = rich_doc.get_tables()
summary = m.query(tables[0], "What is the total in the last row?")

See: Working with Data

TestBasedEval

A Component in mellea.stdlib.components.unit_test_eval that formats an LLM-as-a-judge evaluation task for structured test cases loaded from JSON. Use it in offline evaluation pipelines to verify model behaviour against a set of input/target pairs.

from mellea.stdlib.components.unit_test_eval import TestBasedEval

test_evals = TestBasedEval.from_json_file("tests/eval_data/cases.json")
for eval_case in test_evals:
    verdict = judge_session.instruct(eval_case)

See: Unit Test Generative Code

TemplateFormatter

A ChatFormatter subclass that renders prompts using Jinja2 templates instead of the default chat-message format. Use it when you need precise control over how components are serialised into the final prompt string. Configured per-backend.

See: Template Formatting

TemplateRepresentation

The data class a Component returns from format_for_llm() to describe itself to the TemplateFormatter. It carries the component's template string, named arguments, tool definitions, and field list — everything the formatter needs to render the component into a prompt fragment.

See: Mellea Core Internals

SOFAI

SOFAI (System-1 / System-2 AI) is a sampling strategy in Mellea that mirrors dual-process cognition: a fast "System 1" model generates candidates and a slower "System 2" model verifies them. Uses SOFAISamplingStrategy.

See: Inference-Time Scaling

Tool

A Python function decorated with @tool (or registered via MelleaSession) that Mellea exposes to an LLM for function calling. Tools have typed inputs and outputs so the LLM can call them reliably without free-form parsing.

See: Tools and Agents

ValidationResult

The return type of a custom verifier function. Holds a boolean result (pass/fail) and optional metadata — reason (string explanation), score (float), and thunk (the raw ModelOutputThunk if the verifier used an LLM call internally).

from mellea.core.requirement import ValidationResult

def my_verifier(output: str) -> ValidationResult:
    passed = len(output.split()) < 50
    return ValidationResult(passed, reason="Too long" if not passed else None)

See: Write Custom Verifiers

Thunk

See ModelOutputThunk.

wait_for_all_mots

A helper from mellea.helpers.async_helpers that concurrently resolves a list of ModelOutputThunk objects. All thunks in the list are awaited in parallel; the call returns when every thunk has been computed.

from mellea.helpers.async_helpers import wait_for_all_mots

thunks = [await m.ainstruct(...) for _ in items]
await wait_for_all_mots(thunks)
# All thunks are now resolved — access .value on each.

Total wall-clock time is roughly the latency of the slowest single call rather than the sum of all calls. Use SimpleContext (the default) when calling wait_for_all_mots; concurrent writes to ChatContext can corrupt state.

See: Tutorial 02: Streaming and Async

act() / aact()​

aLoRA (Activated LoRA)​

@generative​

Backend​

ChatContext​

CBlock​

Component​

ComponentParseError​

ContextTurn​

Context​

Document​

Generative function​

Generative program​

GenerateLog​

Hook / HookType​

grounding_context​

CRITERIA_BANK​

SCORING_SCHEMA_BANK​

factuality_correction()​

factuality_detection()​

guardian_check()​

GuardianCheck​

GuardianRisk​

policy_guardrails()​

Granite Switch​

KV smashing​

LiteLLM / LiteLLMBackend​

LLM-as-a-judge​

ImageBlock​

Intrinsic​

Instruction​

IVR (Instruct-Validate-Repair)​

MelleaSession​

mify / @mify​

MObject​

ModelOption​

ModelOutputThunk​

Plugin​

PluginSet​

PreconditionException​

Purple elephant effect​

ReAct​

Requirement​

RichDocument​

SimpleLRUCache​

SimpleContext​

Sampling strategy​

SamplingResult​

Table​

TestBasedEval​

TemplateFormatter​

TemplateRepresentation​

SOFAI​

Tool​

ValidationResult​

Thunk​

wait_for_all_mots​

act() / aact()

aLoRA (Activated LoRA)

@generative

Backend

ChatContext

CBlock

Component

ComponentParseError

ContextTurn

Context

Document

Generative function

Generative program

GenerateLog

Hook / HookType

grounding_context

CRITERIA_BANK

SCORING_SCHEMA_BANK

factuality_correction()

factuality_detection()

guardian_check()

GuardianCheck

GuardianRisk

policy_guardrails()

Granite Switch

KV smashing

LiteLLM / LiteLLMBackend

LLM-as-a-judge

ImageBlock

Intrinsic

Instruction

IVR (Instruct-Validate-Repair)

MelleaSession

mify / @mify

MObject

ModelOption

ModelOutputThunk

Plugin

PluginSet

PreconditionException

Purple elephant effect

ReAct

Requirement

RichDocument

SimpleLRUCache

SimpleContext

Sampling strategy

SamplingResult

Table

TestBasedEval

TemplateFormatter

TemplateRepresentation

SOFAI

Tool

ValidationResult

Thunk

wait_for_all_mots