Provider configuration¶
chunktuner routes LLM and embedding calls through LiteLLM, so you can use OpenAI, Anthropic, Google Gemini, Cohere, Together, Azure OpenAI, or local OpenAI-compatible servers (LM Studio, Ollama, vLLM) by choosing the right model id and credentials.
Configuration surfaces¶
| Surface | How to set base URL and API key |
|---|---|
| Python | LiteLLMEmbeddingFunction(model, api_base=..., api_key=...), Evaluator(..., llm_api_base=..., llm_api_key=...), DatasetBuilder(..., llm_api_base=..., llm_api_key=...), AgenticStrategy(api_base=..., api_key=...) |
| CLI | --api-base, --api-key, --llm-model on chunk-tune recommend, evaluate, and compare; or api_base / api_key / embedding_model / llm_model in .autochunk.yaml |
| MCP server | CHUNKTUNER_API_BASE, CHUNKTUNER_API_KEY, CHUNKTUNER_LLM_MODEL (do not pass secrets in MCP tool arguments) |
Priority for base URL and key: CLI flags override workspace YAML, which overrides CHUNKTUNER_* environment variables. Provider-specific env vars (for example OPENAI_API_KEY, GEMINI_API_KEY) are still read by LiteLLM when you use that provider’s models.
Workspace defaults¶
In .autochunk.yaml, embedding_model defaults to null — chunk-tune init does not write an OpenAI model there any more. With embedding_model: null, all evaluate / recommend / compare runs use dummy embeddings (no API calls, no cost) until you either pass --embedding-model on the CLI or set the field in the YAML. Set llm_model to the LiteLLM model id used for agentic chunking and generation-style metrics when enabled.
Optional fields:
api_base— custom OpenAI-compatible endpoint (LM Studio, Ollama, Azure host, etc.).api_key— optional explicit key; preferCHUNKTUNER_API_KEYor provider env vars in CI.
Quick examples¶
OpenAI — set OPENAI_API_KEY, then e.g. --embedding-model text-embedding-3-small and --llm-model gpt-4o-mini.
Anthropic / Claude — set ANTHROPIC_API_KEY. Claude has no embeddings endpoint, so pair it with Gemini or a local embedding model:
chunk-tune recommend ./docs \
--embedding-model gemini/gemini-embedding-001 \
--llm-model claude-3-haiku-20240307
Google Gemini — set GEMINI_API_KEY, then e.g. --embedding-model gemini/gemini-embedding-001 and --llm-model gemini/gemini-2.0-flash.
LM Studio — start the local server, then:
export CHUNKTUNER_API_BASE=http://localhost:1234/v1
export CHUNKTUNER_API_KEY=lm-studio
export CHUNKTUNER_LLM_MODEL=openai/llama-3.2-3b-instruct
uv run --extra mcp chunk-tune-mcp
Or with the CLI:
chunk-tune recommend ./docs \
--embedding-model openai/nomic-embed-text-v1.5 \
--api-base http://localhost:1234/v1 \
--api-key lm-studio \
--llm-model openai/llama-3.2-3b-instruct \
--yes
MCP note¶
The MCP server does not accept api_key in tool JSON. Configure CHUNKTUNER_API_BASE, CHUNKTUNER_API_KEY, and optionally CHUNKTUNER_LLM_MODEL before launching chunk-tune-mcp.
RAGAS¶
Faithfulness and answer-relevancy still go through RagasBridge (LangChain stack). For non-OpenAI setups, configure the underlying LangChain/OpenAI env vars separately if required; see the evaluator docstring in eval/evaluator.py for the known limitation.