Core
The core module provides foundational utilities for configuration, LLM management, and grading logic.
EvaluatorConfig
truthfulness_evaluator.core.config.EvaluatorConfig
Bases: BaseSettings
Configuration for the truthfulness evaluator.
Source code in src/truthfulness_evaluator/core/config.py
model_post_init(__context)
Fallback to standard env vars if TRUTH_ prefix not used.
Source code in src/truthfulness_evaluator/core/config.py
Usage Example:
from truthfulness_evaluator.core.config import EvaluatorConfig
config = EvaluatorConfig(
llm_provider="openai",
llm_model="gpt-4o",
llm_temperature=0.0,
web_search_enabled=True,
max_web_results=10
)
Environment Variables:
All configuration can be set via environment variables with the TRUTH_ prefix:
export TRUTH_LLM_PROVIDER=openai
export TRUTH_LLM_MODEL=gpt-4o
export TRUTH_LLM_TEMPERATURE=0.0
export TRUTH_WEB_SEARCH_ENABLED=true
export TRUTH_MAX_WEB_RESULTS=10
create_chat_model
truthfulness_evaluator.llm.factory.create_chat_model(model_name, temperature=0, **kwargs)
Create a chat model instance from a model name string.
Centralizes the OpenAI vs. Anthropic routing logic that was previously scattered across 5+ files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name |
str
|
Model identifier (e.g., "gpt-4o", "claude-sonnet-4-5"). |
required |
temperature |
float
|
Sampling temperature. |
0
|
**kwargs |
Any
|
Additional model-specific parameters. Pass |
{}
|
Returns:
| Type | Description |
|---|---|
BaseChatModel
|
Configured BaseChatModel instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If provider cannot be determined from model name. |
Source code in src/truthfulness_evaluator/llm/factory.py
Usage Example:
from truthfulness_evaluator.llm.factory import create_chat_model
# Use default config (from environment)
llm = create_chat_model()
# Override config
llm = create_chat_model(
model="claude-3-5-sonnet-20241022",
provider="anthropic",
temperature=0.2
)
Usage Note: This is the centralized LLM factory function. All chains and adapters use this function to instantiate LLM instances, ensuring consistent configuration and provider routing.
Grading
The grading module provides utilities for calculating truthfulness grades and building final reports.
truthfulness_evaluator.core.grading
Grading and summary logic for truthfulness reports.
build_report(source_document, claims, verifications, *, confidence_threshold=0.7, grade=None, summary=None)
Build a complete TruthfulnessReport with computed fields.
This is the primary way to construct a report. It computes grade, statistics, confidence, and summary from the raw data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_document |
str
|
Path/URL of source document. |
required |
claims |
list[Claim]
|
Extracted claims. |
required |
verifications |
list[VerificationResult]
|
Verification results. |
required |
confidence_threshold |
float
|
Threshold for considering claims verified. |
0.7
|
grade |
str | None
|
Override grade (if None, computed from verifications). |
None
|
summary |
str | None
|
Override summary (if None, generated automatically). |
None
|
Returns:
| Type | Description |
|---|---|
TruthfulnessReport
|
TruthfulnessReport instance. |
Source code in src/truthfulness_evaluator/core/grading.py
calculate_grade(verifications, confidence_threshold=0.7)
Calculate letter grade from verification results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verifications |
list[VerificationResult]
|
List of verification results. |
required |
confidence_threshold |
float
|
Minimum confidence to consider a claim verified. |
0.7
|
Returns:
| Type | Description |
|---|---|
str
|
Letter grade string (A+ through F). |
Source code in src/truthfulness_evaluator/core/grading.py
calculate_statistics(claims, verifications)
Calculate statistics from claims and verifications.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
claims |
list[Claim]
|
List of claims. |
required |
verifications |
list[VerificationResult]
|
List of verification results. |
required |
Returns:
| Type | Description |
|---|---|
TruthfulnessStatistics
|
TruthfulnessStatistics instance. |
Source code in src/truthfulness_evaluator/core/grading.py
generate_summary(grade, statistics)
Generate human-readable summary of evaluation results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grade |
str
|
Letter grade. |
required |
statistics |
TruthfulnessStatistics
|
TruthfulnessStatistics instance. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Summary string. |
Source code in src/truthfulness_evaluator/core/grading.py
is_verified(result, confidence_threshold=0.7)
Whether a verification result meets the verification criteria.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result |
VerificationResult
|
The verification result to check. |
required |
confidence_threshold |
float
|
Minimum confidence to consider verified. |
0.7
|
Returns:
| Type | Description |
|---|---|
bool
|
True if verdict is SUPPORTS/REFUTES and confidence meets threshold. |
Source code in src/truthfulness_evaluator/core/grading.py
Functions:
calculate_grade(verified: int, total: int) -> str: Calculate letter grade (A-F) from verification countsis_verified(verdict: Verdict) -> bool: Check if a verdict counts as verified (true or likely_true)build_report(claims: list[Claim], results: list[VerificationResult]) -> TruthfulnessReport: Build complete report with statistics
Usage Example:
from truthfulness_evaluator.core.grading import calculate_grade, is_verified, build_report
# Calculate grade
grade = calculate_grade(verified=8, total=10) # "B"
# Check if verified
from truthfulness_evaluator.models import Verdict
verified = is_verified("true") # True
verified = is_verified("false") # False
# Build report
report = build_report(claims=extracted_claims, results=verification_results)