models

chunktuner.models

Domain models — single source of truth for library, CLI, and API.

Document

Bases: BaseModel

Ingested unit of text (file, URL, or synthetic) passed to chunking and evaluation.

Chunk

Bases: BaseModel

Text span within a Document; offsets must satisfy doc.content[start:end] == text.

from_document classmethod

from_document(
    doc, *, id, start_offset, end_offset, **kwargs
)

Build a chunk from doc slices; text is always doc.content[start:end].

Source code in src/chunktuner/models.py
@classmethod
def from_document(
    cls,
    doc: Document,
    *,
    id: str,
    start_offset: int,
    end_offset: int,
    **kwargs: Any,
) -> Chunk:
    """Build a chunk from ``doc`` slices; ``text`` is always ``doc.content[start:end]``."""
    n = len(doc.content)
    if not (0 <= start_offset < end_offset <= n):
        raise ValueError(
            f"Offsets [{start_offset}:{end_offset}] out of bounds for doc {doc.id!r} "
            f"(length {n})"
        )
    text = doc.content[start_offset:end_offset]
    return cls(
        id=id,
        document_id=doc.id,
        text=text,
        start_offset=start_offset,
        end_offset=end_offset,
        **kwargs,
    )

ChunkConfig

Bases: BaseModel

Named strategy plus strategy-specific hyperparameters (params).

ChunkingStrategy

Bases: Protocol

Pluggable chunker: exposes metadata, chunk(), parameter schema, and search grid.

EvalMetrics

Bases: BaseModel

Retrieval and optional generation metrics aggregated for one strategy run.

EvalResult

Bases: BaseModel

Outcome of evaluating one (strategy, ChunkConfig) on a corpus and dataset.

Recommendation

Bases: BaseModel

Ranked evaluation results from tuning, including the best config and optional baseline.

EmbeddingFunction

Bases: Protocol

Embeds chunk texts and queries; profile_name labels the model or dummy profile.