Mellea Core Internals
Advanced: This page is for contributors, backend developers, and anyone who wants to understand what happens when Mellea executes a request. If you are building applications with Mellea, you do not need this material.
Mellea's high-level API (m.chat(), m.instruct(), @generative) is built on three
core data structures. Understanding these structures and the abstraction layers above
them explains how Mellea achieves lazy evaluation, parallel dispatch, and composable
context management.
The three core data structures
CBlock
A CBlock (content block) is a wrapper around a string that marks a tokenisation
and KV caching boundary:
from mellea.core import CBlock
block = CBlock("What is 1+1?")
CBlocks are the leaf nodes of every data dependency graph in Mellea. Importantly,
CBlock boundaries affect tokenisation:
tokenise(CBlock(a) + CBlock(b)) == tokenise(a) + tokenise(b)
This may differ from tokenise(a + b). When you care about KV cache reuse, CBlock
boundaries let you control exactly where the tokeniser makes splits.
Component
A Component is a declarative structure that can depend on other Components or
CBlocks. Components are the unit of composition in Mellea. Message,
Instruction, @mify objects, and @generative functions all produce Components.
ModelOutputThunk
A ModelOutputThunk is a lazy reference to a computation result. It represents the
future output of an LLM call — the call may or may not have been dispatched yet
when you receive the thunk. You can pass a thunk as an input to another Component
before the underlying computation has completed.
thunk.is_computed() # True if the value is already available
await thunk.avalue() # Force evaluation; returns the actual value
This lazy evaluation model lets the backend see the full dependency graph of a request before executing anything, enabling batching and optimisation.
The abstraction layers
Each layer below is a thinner wrapper around the one beneath it. You work at whatever level of abstraction the task requires.
Layer 1: MelleaSession
The entry point for most programs. The session bundles a backend, a context, and high-level methods. Everything is handled for you:
from mellea import MelleaSession
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.context import SimpleContext
m = MelleaSession(backend=OllamaModelBackend("granite4:latest"), ctx=SimpleContext())
response = m.chat("What is 1+1?")
print(response.content)
When you call m.chat(), the session:
- Wraps your string in a
Messagecomponent - Passes the component and context to the backend
- Updates the context with the result
- Returns the response as a
Message
Layer 2: Functional API with explicit context
The functional API (mfuncs) exposes the same operations as stateless functions.
Context is threaded explicitly — you pass it in and get a new context back:
import mellea.stdlib.functional as mfuncs
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.context import SimpleContext
response, next_context = mfuncs.chat(
"What is 1+1?",
context=SimpleContext(),
backend=OllamaModelBackend("granite4:latest"),
)
print(response.content)
This is useful when you need to fork, merge, or snapshot context explicitly.
Layer 3: Direct component construction with mfuncs.act()
mfuncs.act() accepts any component or CBlock directly. All other mfuncs
functions (chat, instruct, etc.) are thin wrappers that construct a component
and then call act():
import mellea.stdlib.functional as mfuncs
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.components import Instruction
from mellea.stdlib.context import SimpleContext
response, next_context = mfuncs.act(
action=Instruction("What is 1+1?"),
context=SimpleContext(),
backend=OllamaModelBackend("granite4:latest"),
)
print(response.value)
Layer 4: Async execution with mfuncs.aact()
Mellea's core is async. The synchronous API wraps the async operations with
asyncio.run(). For each method in mfuncs there is an a* async version:
import asyncio
import mellea.stdlib.functional as mfuncs
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.components import Instruction
from mellea.stdlib.context import SimpleContext
async def main():
response, _ = await mfuncs.aact(
Instruction("What is 1+1?"),
context=SimpleContext(),
backend=OllamaModelBackend("granite4:latest"),
)
print(response.value)
asyncio.run(main())
Layer 5: Lazy computation via backend.generate_from_context()
mfuncs.aact() is itself a convenience wrapper around the backend's
generate_from_context() method. Calling it directly returns a ModelOutputThunk
rather than an evaluated response:
import asyncio
from mellea.backends.ollama import OllamaModelBackend
from mellea.core import CBlock
from mellea.stdlib.context import SimpleContext
async def main():
backend = OllamaModelBackend("granite4:latest")
ctx = SimpleContext()
response, _ = await backend.generate_from_context(CBlock("What is 1+1?"), ctx=ctx)
print(f"Computed: {response.is_computed()}") # may be False
print(await response.avalue()) # forces evaluation
print(f"Computed: {response.is_computed()}") # True
asyncio.run(main())
Layer 6: Composing lazy computations
Because thunks are lazy, you can pass a thunk as an input to a second computation before the first one has been evaluated. This lets the backend optimise across the full dependency graph:
import asyncio
from mellea.backends.ollama import OllamaModelBackend
from mellea.core import Backend, CBlock, Context
from mellea.stdlib.components import SimpleComponent
from mellea.stdlib.context import SimpleContext
async def main(backend: Backend, ctx: Context):
x, _ = await backend.generate_from_context(CBlock("What is 1+1?"), ctx=ctx)
y, _ = await backend.generate_from_context(CBlock("What is 2+2?"), ctx=ctx)
# x and y may not have been computed yet — we can still use them as inputs
z, _ = await backend.generate_from_context(
SimpleComponent(instruction="What is x+y?", x=x, y=y),
ctx=ctx,
)
print(f"x computed: {x.is_computed()}")
print(f"y computed: {y.is_computed()}")
print(await z.avalue()) # forces evaluation of the whole graph
asyncio.run(main(OllamaModelBackend("granite4:latest"), SimpleContext()))
The backend sees z's dependency on x and y, evaluates them in order (or
in parallel if the backend supports it), and returns z's result.
Layer summary
| Layer | Entry point | Who uses it |
|---|---|---|
MelleaSession | m.chat(), m.instruct() | Application developers |
mfuncs synchronous | mfuncs.chat(), mfuncs.act() | Application developers needing context control |
mfuncs async | mfuncs.aact(), mfuncs.achat() | Advanced users building async pipelines |
backend.generate_from_context() | Thunks, is_computed(), avalue() | Backend developers, advanced users |
| Composition | SimpleComponent with thunk inputs | Backend developers |
Template and prompt engineering
TemplateFormatter
Mellea formats Python objects into LLM-readable text using a TemplateFormatter.
It uses Jinja2 templates stored in a templates/prompts/ directory. Each
component class can have its own template, looked up by class name.
The formatter resolves templates in this order:
- Cached templates (from recent lookups)
- The formatter's configured template path
- The package that owns the component (
melleaor a third-party package)
Within a template path, the formatter traverses subdirectories matching the model
ID before falling back to default/:
templates/prompts/
├── default/
│ └── Instruction.jinja2 ← fallback for all models
└── granite/
└── granite-3-2/
└── instruct/
└── Instruction.jinja2 ← used for ibm-granite/granite-3.2-8b-instruct
The formatter returns the template from the deepest matching directory. A model ID
of ibm-granite/granite-3.2-8b-instruct matches granite/granite-3-2/instruct
but not ibm/ — only one path should match in any given templates directory.
TemplateRepresentation
Each component's format_for_llm() method returns either a string or a
TemplateRepresentation. The TemplateRepresentation specifies:
- A reference to the component instance
- A dictionary of arguments passed to the template renderer
- A list of tools or functions related to the component
- Either a
template(inline Jinja2 string) or atemplate_order(list of template file names to look up, where*means the class name)
The simplest approach is to return a string directly — this bypasses templating entirely:
def format_for_llm(self) -> str:
return f"Summarise: {self.text}"
Customising templates for an existing class
To change how an existing component is rendered, subclass it and override
format_for_llm(). Then create a new template file at the appropriate path.
See docs/examples/mify/rich_document_advanced.py
for a worked example.
See also: Generative Programming | Working with Data | Async and Streaming