Quickstart¶
Prerequisites¶
- Python 3.10+
- uv or pip
Optional: API keys for your embedding provider (e.g. OPENAI_API_KEY) when you run real evaluations instead of dummy embeddings.
Install¶
| Goal | Command |
|---|---|
| Global CLI | uv tool install chunktuner |
| Project dependency | uv add chunktuner |
| One-off CLI (no install) | uvx chunktuner chunk-tune --help |
Optional extras¶
| Extra | Purpose |
|---|---|
chunktuner[docling] |
PDF, DOCX, PPTX ingestion |
chunktuner[ragas] |
Faithfulness / answer relevancy metrics |
chunktuner[semantic] |
Semantic chunking strategies |
chunktuner[code] |
Tree-sitter code strategies |
chunktuner[mcp] |
MCP server (chunk-tune-mcp) |
chunktuner[all] |
All of the above |
First run (CLI)¶
Typical flow: create workspace config, estimate cost, then recommend.
chunk-tune init --provider openai
chunk-tune estimate ./my_docs --use-case rag_qa
chunk-tune recommend ./my_docs --use-case rag_qa
estimate is dry-run (no paid API calls). recommend uses dummy embeddings unless you pass --embedding-model and confirm with --yes — see CLI reference.
Minimal Python example¶
from pathlib import Path
from chunktuner import (
AutoTuner,
DummyEmbeddingFunction,
Evaluator,
FileIngestor,
ScoreCalculator,
default_registry,
)
# 1) Load documents from a directory (respects supported extensions).
docs = FileIngestor().ingest_dir(Path("./my_docs"))
# 2) Embeddings: dummy is free; swap for LiteLLMEmbeddingFunction for real runs.
embedding_fn = DummyEmbeddingFunction()
# 3) Evaluator + scorer for your use case (rag_qa, search, summarization, code_assist).
evaluator = Evaluator(embedding_fn)
scorer = ScoreCalculator(use_case="rag_qa")
# 4) Grid search over registered strategies.
tuner = AutoTuner(
strategies=default_registry,
evaluator=evaluator,
scorer=scorer,
)
result = tuner.recommend(docs, use_case="rag_qa")
print(result.best.config)
Use ingest_path when you have a single file, or ingest_dir for a tree. See Python API for URLIngestor, RepoIngestor, caching, and parallel tuning.
What’s next¶
- Configuration —
.autochunk.yamland environment variables - Strategy guide
- Metrics reference and metrics glossary
- MCP setup