Verification Methods

Deep dive into Check's 7 verification paradigms and when to use each one.

Verification Methods

Check uses a multi-paradigm approach to verification, combining seven distinct methods to determine the truthfulness of claims. Each method has different strengths, False Positive Rates (FPR), and use cases.

Overview

Method	FPR	Speed	Best For
Semantic	~20-30%	Fast	Quick sanity checks
Reasoning	~4-5%	Medium	General fact verification
Tool	~1%	Slow	Real-time fact-checking
BIPRM	~5-7%	Medium	Complex reasoning chains
Ensemble	~15-20%	Slow	High-stakes decisions
Entropy	~8-10%	Medium	Consistency validation
Formal	0%	Fast	Mathematical claims

FPR (False Positive Rate) = Probability of incorrectly verifying a false claim as true. Lower is better.

Method Details

Semantic

Embedding-based similarity analysis for fast claim comparison against known facts.

Property	Value
FPR	~20-30%
Speed	Fast (~100ms)
Cost	Low

How it works:

Converts the claim to vector embeddings
Compares against known reference facts
Returns a similarity-based confidence score

Best for:

Quick sanity checks
Filtering obviously false claims
High-volume, low-stakes verification

Limitations:

Still the highest FPR method
Cannot verify novel claims
Struggles with nuanced statements

const result = await client.verifyAndWait({
  content: 'Paris is the capital of France.',
  methods: { semantic: 1.0 }
});

Reasoning

LLM-based chain-of-thought verification with adaptive self-refinement for borderline cases.

Property	Value
FPR	~4-5%
Speed	Medium (~1-2s)
Cost	Medium

How it works:

Sends the claim to an LLM for structured analysis
For borderline confidence cases, automatically triggers self-critique to refine the verdict
Self-critique challenges the initial reasoning and produces a refined verdict

Best for:

General fact verification
Claims requiring contextual understanding
Balance of speed and accuracy

Limitations:

Relies on LLM's training data
May not have recent information
Self-critique adds latency for borderline cases

const result = await client.verifyAndWait({
  content: 'The Great Wall of China is visible from space.',
  methods: { reasoning: 1.0 }
});

Tool

Web search-grounded verification with source-level evidence analysis.

Property	Value
FPR	~1%
Speed	Slow (~2-5s)
Cost	High

How it works:

Extracts key claims from the input
Searches authoritative web sources
Weighs supporting vs. contradicting evidence
Returns verdict with source citations

Best for:

Fact-checking current events
Verifying statistics and numbers
Claims about real-world entities

Limitations:

Slower due to web requests
Depends on source availability
Higher cost per verification

const result = await client.verifyAndWait({
  content: 'The population of Tokyo is over 14 million.',
  methods: { tool: 1.0 }
});

// Returns sources in paradigmResults
console.log(result.paradigmResults[0].sources);
// ['https://en.wikipedia.org/wiki/Tokyo', ...]

BIPRM

Bidirectional step-by-step reasoning that analyzes claims from multiple angles to reduce bias.

Property	Value
FPR	~5-7%
Speed	Medium (~1-3s)
Cost	Medium-High

How it works:

Runs forward and reverse reasoning in parallel
Scores each reasoning step independently
Aggregates bidirectional scores for a debiased verdict
When combined with Formal method, cross-validates results

Best for:

Controversial or debatable claims
Claims with multiple interpretations
Content requiring nuanced analysis

Limitations:

May be indecisive on clear-cut claims
Higher cost than single-pass reasoning
Can be slower for simple claims

const result = await client.verifyAndWait({
  content: 'Electric vehicles are better for the environment than gas cars.',
  methods: { biprm: 1.0 }
});

Ensemble

Multi-model consensus with intelligent dispute resolution when models disagree.

Property	Value
FPR	~15-20%
Speed	Slow (~3-5s)
Cost	High

How it works:

Sends claim to multiple LLM providers (GPT-4, Claude, Gemini, etc.)
Collects verdicts from each model
When members disagree, a conditional meta-judge adjudicates the dispute
Returns consensus verdict with confidence

Best for:

High-stakes decisions requiring agreement
Reducing single-model bias
Important content where accuracy is critical

Limitations:

Highest cost (multiple API calls)
Slowest method
Can still have correlated errors

const result = await client.verifyAndWait({
  content: 'This medication is safe for children under 12.',
  methods: { ensemble: 1.0 }
});

Entropy

Semantic consistency analysis that measures how stable a claim's verification is across rephrasing.

Property	Value
FPR	~8-10%
Speed	Medium (~2-3s)
Cost	Medium

How it works:

Generates multiple rephrasings of the claim
Verifies each rephrasing independently
Measures consistency across results — high consistency = high confidence

Best for:

Detecting hallucinations
Checking reasoning consistency
Claims that might be phrasing-dependent

Limitations:

Can give false confidence on consistently wrong claims
Behavioral path adds latency for rephrasings
Logprobs path depends on provider support

const result = await client.verifyAndWait({
  content: 'The CEO of Apple in 2023 was Tim Cook.',
  methods: { entropy: 1.0 }
});

Formal

Symbolic mathematical verification with zero false positive rate for supported claim types.

Property	Value
FPR	0%
Speed	Fast (~100-500ms)
Cost	Low

How it works:

Classifies claims into types: arithmetic, algebraic, comparison, statistical, unit conversion
Translates the claim into a symbolic expression
Evaluates the expression formally for 0% FPR (mathematical certainty)
Claims outside supported categories fall back to LLM-based reasoning (~5-7% FPR)

Best for:

Arithmetic calculations
Algebraic identities
Numerical comparisons
Statistical claims
Unit conversions

Limitations:

Only works for formalizable claims within supported categories
Cannot verify subjective statements
Only supports specific mathematical categories

// Arithmetic verification
const result = await client.verifyAndWait({
  content: 'The square root of 144 is 12.',
  methods: { formal: 1.0 }
});

// Logical verification
const result = await client.verifyAndWait({
  content: 'If A implies B and B implies C, then A implies C.',
  methods: { formal: 1.0 }
});

Combining Methods

Methods can be combined with weights to balance accuracy and speed:

Fast + Reasonably Accurate (Chatbots)

const result = await client.verifyAndWait({
  content: 'Your claim here',
  methods: { reasoning: 1.0 }
});

~1-2 second response
~4-5% FPR
Good for real-time applications

High Accuracy (Important Content)

const result = await client.verifyAndWait({
  content: 'Critical claim requiring verification',
  methods: {
    reasoning: 1.0,
    tool: 1.0,
    biprm: 0.5
  }
});

3-5 second response
~2-5% FPR
Good for healthcare, legal, financial

Maximum Confidence (Critical Decisions)

const result = await client.verifyAndWait({
  content: 'Claim with major consequences if wrong',
  methods: {
    reasoning: 1.0,
    tool: 1.0,
    ensemble: 1.0,
    biprm: 0.5
  }
});

5-10 second response
~1-2% FPR
Good for publication, legal documents

Mathematical Claims

const result = await client.verifyAndWait({
  content: '2^10 = 1024',
  methods: { formal: 1.0 }
});

~100ms response
0% FPR
Perfect for calculations

How Results Are Aggregated

Check uses FPR-weighted voting to combine results from multiple methods:

Each method returns:
- Verdict: true, false, or uncertain
- Confidence: 0.0 to 1.0
Weights are calculated based on:
- Method's FPR (lower FPR = higher weight)
- User-specified method weight
- Method's confidence in its verdict
Final verdict is determined by:
- Weighted vote across all methods
- If methods strongly disagree, verdict becomes uncertain
Decision thresholds:

Confidence Decision Condition
>= 0.95 accept High confidence
0.70 – 0.95 refine Moderate confidence
< 0.70 escalate Low confidence + paradigm disagreement
< 0.70 reject Low confidence + paradigm agreement

Confidence	Decision	Condition
>= 0.95	`accept`	High confidence
0.70 – 0.95	`refine`	Moderate confidence
< 0.70	`escalate`	Low confidence + paradigm disagreement
< 0.70	`reject`	Low confidence + paradigm agreement

Choosing Methods by Use Case

Real-Time Applications

Chatbots, live responses:

methods: { reasoning: 1.0 }

Content Moderation

Social media, user-generated content:

methods: { reasoning: 1.0, tool: 0.5 }

Healthcare

Medical information verification:

methods: { reasoning: 1.0, tool: 1.0, ensemble: 1.0 }

Finance

Financial claims and data:

methods: { reasoning: 1.0, tool: 1.0, formal: 0.5 }

Legal

Legal documents and claims:

methods: { reasoning: 1.0, tool: 1.0, biprm: 1.0 }

Education

Textbook content verification:

methods: { reasoning: 1.0, tool: 1.0 }

Scientific

Research claims:

methods: { reasoning: 1.0, tool: 1.0, formal: 0.5, entropy: 0.5 }

Cost Optimization

Methods have different costs. Optimize by:

Start with reasoning - Best balance of cost/accuracy
Add tool for factual claims - When current info needed
Use ensemble sparingly - Only for critical decisions
Use formal for math - Cheapest and most accurate for calculations

// Cost-effective pipeline
async function verify(content: string, importance: 'low' | 'medium' | 'high') {
  const methods = {
    low: { reasoning: 1.0 },
    medium: { reasoning: 1.0, tool: 0.5 },
    high: { reasoning: 1.0, tool: 1.0, ensemble: 0.5 }
  };

  return client.verifyAndWait({
    content,
    methods: methods[importance]
  });
}

Verification Methods

Deep dive into Check's 7 verification paradigms and when to use each one.

Verification Methods

Overview

Method	FPR	Speed	Best For
Semantic	~20-30%	Fast	Quick sanity checks
Reasoning	~4-5%	Medium	General fact verification
Tool	~1%	Slow	Real-time fact-checking
BIPRM	~5-7%	Medium	Complex reasoning chains
Ensemble	~15-20%	Slow	High-stakes decisions
Entropy	~8-10%	Medium	Consistency validation
Formal	0%	Fast	Mathematical claims

FPR (False Positive Rate) = Probability of incorrectly verifying a false claim as true. Lower is better.

Method Details

Semantic

Embedding-based similarity analysis for fast claim comparison against known facts.

Property	Value
FPR	~20-30%
Speed	Fast (~100ms)
Cost	Low

How it works:

Converts the claim to vector embeddings
Compares against known reference facts
Returns a similarity-based confidence score

Best for:

Quick sanity checks
Filtering obviously false claims
High-volume, low-stakes verification

Limitations:

Still the highest FPR method
Cannot verify novel claims
Struggles with nuanced statements

const result = await client.verifyAndWait({
  content: 'Paris is the capital of France.',
  methods: { semantic: 1.0 }
});

Reasoning

LLM-based chain-of-thought verification with adaptive self-refinement for borderline cases.

Property	Value
FPR	~4-5%
Speed	Medium (~1-2s)
Cost	Medium

How it works:

Sends the claim to an LLM for structured analysis
For borderline confidence cases, automatically triggers self-critique to refine the verdict
Self-critique challenges the initial reasoning and produces a refined verdict

Best for:

General fact verification
Claims requiring contextual understanding
Balance of speed and accuracy

Limitations:

Relies on LLM's training data
May not have recent information
Self-critique adds latency for borderline cases

const result = await client.verifyAndWait({
  content: 'The Great Wall of China is visible from space.',
  methods: { reasoning: 1.0 }
});

Tool

Web search-grounded verification with source-level evidence analysis.

Property	Value
FPR	~1%
Speed	Slow (~2-5s)
Cost	High

How it works:

Extracts key claims from the input
Searches authoritative web sources
Weighs supporting vs. contradicting evidence
Returns verdict with source citations

Best for:

Fact-checking current events
Verifying statistics and numbers
Claims about real-world entities

Limitations:

Slower due to web requests
Depends on source availability
Higher cost per verification

const result = await client.verifyAndWait({
  content: 'The population of Tokyo is over 14 million.',
  methods: { tool: 1.0 }
});

// Returns sources in paradigmResults
console.log(result.paradigmResults[0].sources);
// ['https://en.wikipedia.org/wiki/Tokyo', ...]

BIPRM

Bidirectional step-by-step reasoning that analyzes claims from multiple angles to reduce bias.

Property	Value
FPR	~5-7%
Speed	Medium (~1-3s)
Cost	Medium-High

How it works:

Runs forward and reverse reasoning in parallel
Scores each reasoning step independently
Aggregates bidirectional scores for a debiased verdict
When combined with Formal method, cross-validates results

Best for:

Controversial or debatable claims
Claims with multiple interpretations
Content requiring nuanced analysis

Limitations:

May be indecisive on clear-cut claims
Higher cost than single-pass reasoning
Can be slower for simple claims

const result = await client.verifyAndWait({
  content: 'Electric vehicles are better for the environment than gas cars.',
  methods: { biprm: 1.0 }
});

Ensemble

Multi-model consensus with intelligent dispute resolution when models disagree.

Property	Value
FPR	~15-20%
Speed	Slow (~3-5s)
Cost	High

How it works:

Sends claim to multiple LLM providers (GPT-4, Claude, Gemini, etc.)
Collects verdicts from each model
When members disagree, a conditional meta-judge adjudicates the dispute
Returns consensus verdict with confidence

Best for:

High-stakes decisions requiring agreement
Reducing single-model bias
Important content where accuracy is critical

Limitations:

Highest cost (multiple API calls)
Slowest method
Can still have correlated errors

const result = await client.verifyAndWait({
  content: 'This medication is safe for children under 12.',
  methods: { ensemble: 1.0 }
});

Entropy

Semantic consistency analysis that measures how stable a claim's verification is across rephrasing.

Property	Value
FPR	~8-10%
Speed	Medium (~2-3s)
Cost	Medium

How it works:

Generates multiple rephrasings of the claim
Verifies each rephrasing independently
Measures consistency across results — high consistency = high confidence

Best for:

Detecting hallucinations
Checking reasoning consistency
Claims that might be phrasing-dependent

Limitations:

Can give false confidence on consistently wrong claims
Behavioral path adds latency for rephrasings
Logprobs path depends on provider support

const result = await client.verifyAndWait({
  content: 'The CEO of Apple in 2023 was Tim Cook.',
  methods: { entropy: 1.0 }
});

Formal

Symbolic mathematical verification with zero false positive rate for supported claim types.

Property	Value
FPR	0%
Speed	Fast (~100-500ms)
Cost	Low

How it works:

Classifies claims into types: arithmetic, algebraic, comparison, statistical, unit conversion
Translates the claim into a symbolic expression
Evaluates the expression formally for 0% FPR (mathematical certainty)
Claims outside supported categories fall back to LLM-based reasoning (~5-7% FPR)

Best for:

Arithmetic calculations
Algebraic identities
Numerical comparisons
Statistical claims
Unit conversions

Limitations:

Only works for formalizable claims within supported categories
Cannot verify subjective statements
Only supports specific mathematical categories

// Arithmetic verification
const result = await client.verifyAndWait({
  content: 'The square root of 144 is 12.',
  methods: { formal: 1.0 }
});

// Logical verification
const result = await client.verifyAndWait({
  content: 'If A implies B and B implies C, then A implies C.',
  methods: { formal: 1.0 }
});

Combining Methods

Methods can be combined with weights to balance accuracy and speed:

Fast + Reasonably Accurate (Chatbots)

const result = await client.verifyAndWait({
  content: 'Your claim here',
  methods: { reasoning: 1.0 }
});

~1-2 second response
~4-5% FPR
Good for real-time applications

High Accuracy (Important Content)

const result = await client.verifyAndWait({
  content: 'Critical claim requiring verification',
  methods: {
    reasoning: 1.0,
    tool: 1.0,
    biprm: 0.5
  }
});

3-5 second response
~2-5% FPR
Good for healthcare, legal, financial

Maximum Confidence (Critical Decisions)

const result = await client.verifyAndWait({
  content: 'Claim with major consequences if wrong',
  methods: {
    reasoning: 1.0,
    tool: 1.0,
    ensemble: 1.0,
    biprm: 0.5
  }
});

5-10 second response
~1-2% FPR
Good for publication, legal documents

Mathematical Claims

const result = await client.verifyAndWait({
  content: '2^10 = 1024',
  methods: { formal: 1.0 }
});

~100ms response
0% FPR
Perfect for calculations

How Results Are Aggregated

Check uses FPR-weighted voting to combine results from multiple methods:

Each method returns:
- Verdict: true, false, or uncertain
- Confidence: 0.0 to 1.0
Weights are calculated based on:
- Method's FPR (lower FPR = higher weight)
- User-specified method weight
- Method's confidence in its verdict
Final verdict is determined by:
- Weighted vote across all methods
- If methods strongly disagree, verdict becomes uncertain
Decision thresholds:

Confidence Decision Condition
>= 0.95 accept High confidence
0.70 – 0.95 refine Moderate confidence
< 0.70 escalate Low confidence + paradigm disagreement
< 0.70 reject Low confidence + paradigm agreement

Confidence	Decision	Condition
>= 0.95	`accept`	High confidence
0.70 – 0.95	`refine`	Moderate confidence
< 0.70	`escalate`	Low confidence + paradigm disagreement
< 0.70	`reject`	Low confidence + paradigm agreement

Choosing Methods by Use Case

Real-Time Applications

Chatbots, live responses:

methods: { reasoning: 1.0 }

Content Moderation

Social media, user-generated content:

methods: { reasoning: 1.0, tool: 0.5 }

Healthcare

Medical information verification:

methods: { reasoning: 1.0, tool: 1.0, ensemble: 1.0 }

Finance

Financial claims and data:

methods: { reasoning: 1.0, tool: 1.0, formal: 0.5 }

Legal

Legal documents and claims:

methods: { reasoning: 1.0, tool: 1.0, biprm: 1.0 }

Education

Textbook content verification:

methods: { reasoning: 1.0, tool: 1.0 }

Scientific

Research claims:

methods: { reasoning: 1.0, tool: 1.0, formal: 0.5, entropy: 0.5 }

Cost Optimization

Methods have different costs. Optimize by:

Start with reasoning - Best balance of cost/accuracy
Add tool for factual claims - When current info needed
Use ensemble sparingly - Only for critical decisions
Use formal for math - Cheapest and most accurate for calculations

// Cost-effective pipeline
async function verify(content: string, importance: 'low' | 'medium' | 'high') {
  const methods = {
    low: { reasoning: 1.0 },
    medium: { reasoning: 1.0, tool: 0.5 },
    high: { reasoning: 1.0, tool: 1.0, ensemble: 0.5 }
  };

  return client.verifyAndWait({
    content,
    methods: methods[importance]
  });
}