Verification Methods
Deep dive into Check's 7 verification paradigms and when to use each one.
Verification Methods
Check uses a multi-paradigm approach to verification, combining seven distinct methods to determine the truthfulness of claims. Each method has different strengths, False Positive Rates (FPR), and use cases.
Overview
| Method | FPR | Speed | Best For |
|---|---|---|---|
| Semantic | ~20-30% | Fast | Quick sanity checks |
| Reasoning | ~4-5% | Medium | General fact verification |
| Tool | ~1% | Slow | Real-time fact-checking |
| BIPRM | ~5-7% | Medium | Complex reasoning chains |
| Ensemble | ~15-20% | Slow | High-stakes decisions |
| Entropy | ~8-10% | Medium | Consistency validation |
| Formal | 0% | Fast | Mathematical claims |
FPR (False Positive Rate) = Probability of incorrectly verifying a false claim as true. Lower is better.
Method Details
Semantic
Embedding-based similarity analysis for fast claim comparison against known facts.
| Property | Value |
|---|---|
| FPR | ~20-30% |
| Speed | Fast (~100ms) |
| Cost | Low |
How it works:
- Converts the claim to vector embeddings
- Compares against known reference facts
- Returns a similarity-based confidence score
Best for:
- Quick sanity checks
- Filtering obviously false claims
- High-volume, low-stakes verification
Limitations:
- Still the highest FPR method
- Cannot verify novel claims
- Struggles with nuanced statements
const result = await client.verifyAndWait({
content: 'Paris is the capital of France.',
methods: { semantic: 1.0 }
});Reasoning
LLM-based chain-of-thought verification with adaptive self-refinement for borderline cases.
| Property | Value |
|---|---|
| FPR | ~4-5% |
| Speed | Medium (~1-2s) |
| Cost | Medium |
How it works:
- Sends the claim to an LLM for structured analysis
- For borderline confidence cases, automatically triggers self-critique to refine the verdict
- Self-critique challenges the initial reasoning and produces a refined verdict
Best for:
- General fact verification
- Claims requiring contextual understanding
- Balance of speed and accuracy
Limitations:
- Relies on LLM's training data
- May not have recent information
- Self-critique adds latency for borderline cases
const result = await client.verifyAndWait({
content: 'The Great Wall of China is visible from space.',
methods: { reasoning: 1.0 }
});Tool
Web search-grounded verification with source-level evidence analysis.
| Property | Value |
|---|---|
| FPR | ~1% |
| Speed | Slow (~2-5s) |
| Cost | High |
How it works:
- Extracts key claims from the input
- Searches authoritative web sources
- Weighs supporting vs. contradicting evidence
- Returns verdict with source citations
Best for:
- Fact-checking current events
- Verifying statistics and numbers
- Claims about real-world entities
Limitations:
- Slower due to web requests
- Depends on source availability
- Higher cost per verification
const result = await client.verifyAndWait({
content: 'The population of Tokyo is over 14 million.',
methods: { tool: 1.0 }
});
// Returns sources in paradigmResults
console.log(result.paradigmResults[0].sources);
// ['https://en.wikipedia.org/wiki/Tokyo', ...]BIPRM
Bidirectional step-by-step reasoning that analyzes claims from multiple angles to reduce bias.
| Property | Value |
|---|---|
| FPR | ~5-7% |
| Speed | Medium (~1-3s) |
| Cost | Medium-High |
How it works:
- Runs forward and reverse reasoning in parallel
- Scores each reasoning step independently
- Aggregates bidirectional scores for a debiased verdict
- When combined with Formal method, cross-validates results
Best for:
- Controversial or debatable claims
- Claims with multiple interpretations
- Content requiring nuanced analysis
Limitations:
- May be indecisive on clear-cut claims
- Higher cost than single-pass reasoning
- Can be slower for simple claims
const result = await client.verifyAndWait({
content: 'Electric vehicles are better for the environment than gas cars.',
methods: { biprm: 1.0 }
});Ensemble
Multi-model consensus with intelligent dispute resolution when models disagree.
| Property | Value |
|---|---|
| FPR | ~15-20% |
| Speed | Slow (~3-5s) |
| Cost | High |
How it works:
- Sends claim to multiple LLM providers (GPT-4, Claude, Gemini, etc.)
- Collects verdicts from each model
- When members disagree, a conditional meta-judge adjudicates the dispute
- Returns consensus verdict with confidence
Best for:
- High-stakes decisions requiring agreement
- Reducing single-model bias
- Important content where accuracy is critical
Limitations:
- Highest cost (multiple API calls)
- Slowest method
- Can still have correlated errors
const result = await client.verifyAndWait({
content: 'This medication is safe for children under 12.',
methods: { ensemble: 1.0 }
});Entropy
Semantic consistency analysis that measures how stable a claim's verification is across rephrasing.
| Property | Value |
|---|---|
| FPR | ~8-10% |
| Speed | Medium (~2-3s) |
| Cost | Medium |
How it works:
- Generates multiple rephrasings of the claim
- Verifies each rephrasing independently
- Measures consistency across results — high consistency = high confidence
Best for:
- Detecting hallucinations
- Checking reasoning consistency
- Claims that might be phrasing-dependent
Limitations:
- Can give false confidence on consistently wrong claims
- Behavioral path adds latency for rephrasings
- Logprobs path depends on provider support
const result = await client.verifyAndWait({
content: 'The CEO of Apple in 2023 was Tim Cook.',
methods: { entropy: 1.0 }
});Formal
Symbolic mathematical verification with zero false positive rate for supported claim types.
| Property | Value |
|---|---|
| FPR | 0% |
| Speed | Fast (~100-500ms) |
| Cost | Low |
How it works:
- Classifies claims into types: arithmetic, algebraic, comparison, statistical, unit conversion
- Translates the claim into a symbolic expression
- Evaluates the expression formally for 0% FPR (mathematical certainty)
- Claims outside supported categories fall back to LLM-based reasoning (~5-7% FPR)
Best for:
- Arithmetic calculations
- Algebraic identities
- Numerical comparisons
- Statistical claims
- Unit conversions
Limitations:
- Only works for formalizable claims within supported categories
- Cannot verify subjective statements
- Only supports specific mathematical categories
// Arithmetic verification
const result = await client.verifyAndWait({
content: 'The square root of 144 is 12.',
methods: { formal: 1.0 }
});
// Logical verification
const result = await client.verifyAndWait({
content: 'If A implies B and B implies C, then A implies C.',
methods: { formal: 1.0 }
});Combining Methods
Methods can be combined with weights to balance accuracy and speed:
Fast + Reasonably Accurate (Chatbots)
const result = await client.verifyAndWait({
content: 'Your claim here',
methods: { reasoning: 1.0 }
});- ~1-2 second response
- ~4-5% FPR
- Good for real-time applications
High Accuracy (Important Content)
const result = await client.verifyAndWait({
content: 'Critical claim requiring verification',
methods: {
reasoning: 1.0,
tool: 1.0,
biprm: 0.5
}
});- 3-5 second response
- ~2-5% FPR
- Good for healthcare, legal, financial
Maximum Confidence (Critical Decisions)
const result = await client.verifyAndWait({
content: 'Claim with major consequences if wrong',
methods: {
reasoning: 1.0,
tool: 1.0,
ensemble: 1.0,
biprm: 0.5
}
});- 5-10 second response
- ~1-2% FPR
- Good for publication, legal documents
Mathematical Claims
const result = await client.verifyAndWait({
content: '2^10 = 1024',
methods: { formal: 1.0 }
});- ~100ms response
- 0% FPR
- Perfect for calculations
How Results Are Aggregated
Check uses FPR-weighted voting to combine results from multiple methods:
-
Each method returns:
- Verdict:
true,false, oruncertain - Confidence: 0.0 to 1.0
- Verdict:
-
Weights are calculated based on:
- Method's FPR (lower FPR = higher weight)
- User-specified method weight
- Method's confidence in its verdict
-
Final verdict is determined by:
- Weighted vote across all methods
- If methods strongly disagree, verdict becomes
uncertain
-
Decision thresholds:
Confidence Decision Condition >= 0.95 acceptHigh confidence 0.70 – 0.95 refineModerate confidence < 0.70 escalateLow confidence + paradigm disagreement < 0.70 rejectLow confidence + paradigm agreement
Choosing Methods by Use Case
Real-Time Applications
Chatbots, live responses:
methods: { reasoning: 1.0 }Content Moderation
Social media, user-generated content:
methods: { reasoning: 1.0, tool: 0.5 }Healthcare
Medical information verification:
methods: { reasoning: 1.0, tool: 1.0, ensemble: 1.0 }Finance
Financial claims and data:
methods: { reasoning: 1.0, tool: 1.0, formal: 0.5 }Legal
Legal documents and claims:
methods: { reasoning: 1.0, tool: 1.0, biprm: 1.0 }Education
Textbook content verification:
methods: { reasoning: 1.0, tool: 1.0 }Scientific
Research claims:
methods: { reasoning: 1.0, tool: 1.0, formal: 0.5, entropy: 0.5 }Cost Optimization
Methods have different costs. Optimize by:
- Start with reasoning - Best balance of cost/accuracy
- Add tool for factual claims - When current info needed
- Use ensemble sparingly - Only for critical decisions
- Use formal for math - Cheapest and most accurate for calculations
// Cost-effective pipeline
async function verify(content: string, importance: 'low' | 'medium' | 'high') {
const methods = {
low: { reasoning: 1.0 },
medium: { reasoning: 1.0, tool: 0.5 },
high: { reasoning: 1.0, tool: 1.0, ensemble: 0.5 }
};
return client.verifyAndWait({
content,
methods: methods[importance]
});
}