Filesystem Evidence
When to Use
Verify documentation against actual code:
- README claims about API behavior
- Version requirements in docs
- Feature descriptions
- Installation instructions
Perfect for Documentation Drift Detection
Run filesystem verification in CI to catch when code changes break documentation. For example, if a function signature changes from process(data) to process(data, options), the agent will detect the mismatch and flag the claim as refuted.
Scoping the Root Path
Set --root-path ./src to limit the agent to source code only, speeding up verification and avoiding false matches in test fixtures or example code. For full-repo verification including tests and configs, use --root-path .
Basic Usage
How It Works
The React agent uses these tools:
| Tool | Purpose |
|---|---|
list_files |
Browse directory structure |
read_file |
Read source files |
grep_files |
Search for patterns |
find_related_files |
Follow imports/links |
React Agent Decision Loop
The filesystem agent implements a ReAct (Reasoning + Acting) loop to intelligently gather evidence:
graph TD
A[Claim to Verify] --> B[Agent Reasons About Claim]
B --> C{Choose Action}
C -->|Browse| D[list_files]
C -->|Read| E[read_file]
C -->|Search| F[grep_files]
C -->|Follow Links| G[find_related_files]
D --> H{Enough Evidence?}
E --> H
F --> H
G --> H
H -->|No| B
H -->|Yes| I[Return Evidence]
The agent starts by reasoning about what evidence would verify the claim, then chooses appropriate tools to gather it. After each action, the agent evaluates whether enough evidence has been collected or if additional exploration is needed. This continues until sufficient evidence is found or a maximum iteration limit is reached.
For example, to verify "The process() function accepts a DataFrame", the agent might:
- Reason: "I need to find the process() function definition"
- Act:
grep_files("def process", "*.py") - Observe: Found in
src/core.py:45 - Reason: "Now I need to read the full signature"
- Act:
read_file("src/core.py", start_line=45, end_line=60) - Observe: Signature shows
def process(data: Union[DataFrame, dict], ...) - Reason: "Evidence confirms the claim"
- Return: Evidence with high relevance score
Example: API Documentation
README claims:
## API
The `process()` function accepts:
- `data`: DataFrame or dict
- `batch_size`: Integer, default 100
- Returns: Processed DataFrame
Verification:
Agent actions:
1. Search for process function definition
2. Read function signature
3. Check parameter types
4. Verify return type
Detailed agent execution trace:
🤖 Agent Step 1: Reasoning
→ Need to locate the process() function definition
🔧 Agent Step 1: Action
→ grep_files(pattern="def process", file_pattern="**/*.py")
📄 Agent Step 1: Observation
→ Found 1 match: src/core.py:45
🤖 Agent Step 2: Reasoning
→ Need to read the full function signature and docstring
🔧 Agent Step 2: Action
→ read_file(path="src/core.py", start_line=45, end_line=65)
📄 Agent Step 2: Observation
→ def process(data: Union[DataFrame, dict], batch_size: int = 100) -> Union[DataFrame, dict]:
🤖 Agent Step 3: Reasoning
→ Evidence confirms data accepts DataFrame or dict, batch_size defaults to 100
→ Return type is Union, not just DataFrame - partial match only
✅ Agent Step 3: Complete
→ Returning 2 evidence pieces with relevance scores
Result:
✅ process() accepts DataFrame or dict → SUPPORTS (95% confidence)
📁 Evidence: src/core.py:45
✅ batch_size default 100 → SUPPORTS (100% confidence)
📁 Evidence: src/core.py:45 (def process(..., batch_size: int = 100))
⚠️ Returns DataFrame → NOT_ENOUGH_INFO (60% confidence)
📁 Evidence: src/core.py:45 (returns Union[DataFrame, dict], not just DataFrame)
🗳️ Model votes: gpt-4o: REFUTES, gpt-4o-mini: NOT_ENOUGH_INFO
💡 Recommendation: Update docs to mention dict is also a possible return type
Example: Version Requirements
README:
Agent execution:
🤖 Agent Step 1: Reasoning
→ Python version requirements typically in pyproject.toml or setup.py
🔧 Agent Step 1: Action
→ list_files(path=".", pattern="*.toml")
📄 Agent Step 1: Observation
→ Found: pyproject.toml, uv.lock
🔧 Agent Step 2: Action
→ read_file(path="pyproject.toml")
📄 Agent Step 2: Observation
→ requires-python = ">=3.11"
→ [project] section confirms minimum version
🤖 Agent Step 3: Reasoning
→ Found exact match in dependency specification
→ Should verify CI pipeline matches
🔧 Agent Step 3: Action
→ read_file(path=".github/workflows/ci.yml")
📄 Agent Step 3: Observation
→ python-version: ['3.11', '3.12', '3.13']
→ CI tests against 3.11 minimum
✅ Agent Step 4: Complete
→ Multiple sources confirm Python 3.11+ requirement
Result:
✅ Requires Python 3.11 or higher → SUPPORTS (98% confidence)
📁 Evidence:
• pyproject.toml:10 (requires-python = ">=3.11")
• .github/workflows/ci.yml:18 (python-version: ['3.11', '3.12', '3.13'])
🗳️ Votes: gpt-4o: SUPPORTS, gpt-4o-mini: SUPPORTS
Example: Feature Claims
README:
Agent execution:
🤖 Agent Step 1: Reasoning
→ Looking for async function definitions and await usage
🔧 Agent Step 1: Action
→ grep_files(pattern="async def", file_pattern="**/*.py")
📄 Agent Step 1: Observation
→ Found 23 matches across 8 files:
• src/graph.py: async def ainvoke
• src/chains/verification.py: async def verify
• src/agents/evidence_agent.py: async def search
• ...
🤖 Agent Step 2: Reasoning
→ Multiple async functions found, need to verify concurrent processing claim
🔧 Agent Step 2: Action
→ grep_files(pattern="asyncio.gather|await.*await", file_pattern="**/*.py")
📄 Agent Step 2: Observation
→ Found concurrent patterns in src/chains/verification.py:
• results = await asyncio.gather(*[model.verify(...) for model in models])
🤖 Agent Step 3: Reasoning
→ Evidence confirms both async/await support AND concurrent processing
✅ Agent Step 3: Complete
→ Claim fully supported by codebase
Result:
✅ Supports async/await patterns → SUPPORTS (95% confidence)
📁 Evidence:
• src/graph.py:45 (async def ainvoke)
• src/chains/verification.py:78 (async def verify)
• 21 other async functions found
✅ Concurrent processing → SUPPORTS (92% confidence)
📁 Evidence:
• src/chains/verification.py:112 (asyncio.gather for parallel verification)
🗳️ Votes: gpt-4o: SUPPORTS, gpt-4o-mini: SUPPORTS
Python API
from truthfulness_evaluator.evidence.agent import FilesystemEvidenceAgent
agent = FilesystemEvidenceAgent(root_path="./src")
evidence = await agent.search("process function accepts DataFrame")
for e in evidence:
print(f"{e['file_path']}: {e['content'][:100]}")
Custom File Patterns
Edit tools/filesystem.py to add patterns:
@tool
def grep_files(pattern: str, file_pattern: str = "*.py"):
"""Search in Python files by default"""
...
Security
Agent is scoped to root_path:
# Safe
agent = FilesystemEvidenceAgent(root_path="./project")
# Blocks access to parent directories
agent.read_file("../../etc/passwd") # Error: Outside allowed directory
Performance Tips
| Issue | Solution |
|---|---|
| Large repos | Exclude node_modules/, .git/ |
| Slow search | Limit file size checks |
| Many files | Use specific file patterns |
Troubleshooting
No Evidence Found
# Check root path is correct
truth-eval README.md --root-path ./src # Not just .
# Verify files exist
ls ./src/*.py
Wrong Files Read
The agent uses LLM reasoning to select files. If it picks wrong files:
- Improve claim specificity
- Add more context to README
- Use explicit file references