14 KiB

Raw Blame History

RAG Failure Checklist for Multi-Agent Trading Workflows

A compact debugging checklist for multi-agent trading systems that use LLMs, retrieval, tools, role-based analysts, and staged decision handoffs.

This page is designed for cases where the system sounds fluent, but the output is clearly wrong, inconsistent, fragile, or difficult to trust.

What this page is for

Use this checklist when you see symptoms like:

analysts disagreeing but the final decision does not explain why one side won
a trade recommendation that cannot be traced back to clear evidence
stale context leaking into a new run
a polished explanation built on weak or partial support
role handoffs that lose constraints, caveats, or key market assumptions

This page is not a strategy guide, a backtest framework, or a portfolio construction manual.

Its purpose is simpler:

improve the first diagnostic cut
reduce wasted debugging cycles
help identify which layer is failing before editing prompts blindly

Summary table

#	problem domain	what breaks	quick link
1	Input reality mismatch	retrieval brings the wrong asset, wrong regime, or wrong context	No.1
2	Interpretation collapse	the retrieved material is relevant, but the logic built on top of it is wrong	No.2
3	Long reasoning chain drift	multi-step analysis slowly shifts away from the original task	No.3
4	Bluffing and overconfidence	the model sounds certain without enough support	No.4
5	Semantic mismatch in retrieval	surface similarity replaces true relevance	No.5
6	Logic collapse and recovery failure	the reasoning path hits a dead end but does not recover safely	No.6
7	Memory breaks across sessions	earlier context or assumptions are lost or mixed	No.7
8	Debugging as a black box	a bad output appears, but the failure path is not visible	No.8
9	Entropy collapse in outputs	outputs become noisy, repetitive, or structurally incoherent	No.9
10	Creative freeze in analysis	the system gives flat and literal analysis when synthesis is needed	No.10
11	Symbolic collapse	abstract or logical prompts break under structured reasoning pressure	No.11
12	Philosophical recursion	self-reference loops or paradox-like reasoning traps the workflow	No.12
13	Multi-agent chaos	agent roles overwrite, conflict, or misalign without resolution	No.13
14	Bootstrap ordering failures	components act before dependencies or context are ready	No.14
15	Deployment deadlock assumptions	system stages wait on each other in the wrong order	No.15
16	Pre-deploy collapse	version skew, missing secrets, or incomplete setup breaks the first live run	No.16

How to use this checklist

When a run goes wrong, do not start by rewriting prompts immediately.

Use this order first:

verify that the input reality is correct
verify that the retrieved evidence is fresh and relevant
verify that role handoffs preserve assumptions and constraints
verify that disagreement is resolved explicitly
verify that the final answer can be traced back to evidence
only then tune prompts, wording, or style

Multiple failure patterns may appear at the same time. The goal is not to force everything into one box. The goal is to avoid the wrong first cut.

No.1 Data reality mismatch

What it means

The system is grounded in the wrong asset, wrong timeframe, wrong market session, or wrong event background.

What it looks like in TradingAgents

An analyst discusses a catalyst, earnings event, or price regime that does not match the ticker or time horizon of the current run.

Check first

ticker and instrument mapping
date range and session boundaries
alignment between retrieved inputs and the actual task

No.2 Interpretation collapse

What it means

The retrieved material is relevant, but the interpretation built on top of it is wrong.

What it looks like in TradingAgents

The system retrieves the right earnings, macro, or market data, but draws the wrong conclusion from it.

Check first

whether the evidence actually supports the conclusion
whether the role is summarizing versus inferring
whether the reasoning step compresses away key conditions

No.3 Long reasoning chain drift

What it means

A multi-step workflow slowly shifts away from the original task.

What it looks like in TradingAgents

A short-horizon trading task turns into a general company summary, or a portfolio-level question gets treated like a single-name research memo.

Check first

task framing for each role
time horizon consistency
handoff wording between planner, analysts, trader, and portfolio manager

No.4 Bluffing and overconfidence

What it means

The system sounds more certain than the evidence allows.

What it looks like in TradingAgents

A strong buy or sell recommendation appears even when the evidence is mixed, thin, or partly generic.

Check first

confidence language
minimum evidence threshold
whether uncertainty can be expressed without collapsing into vague text

No.5 Semantic mismatch in retrieval

What it means

The retrieval step returns content that looks similar on the surface but is not truly relevant.

What it looks like in TradingAgents

The run pulls related sector commentary, ETF discussion, or nearby company context instead of the most decision-critical evidence for the current symbol.

Check first

retrieval filters
ranking logic
symbol disambiguation and query formulation

No.6 Logic collapse and recovery failure

What it means

The reasoning path gets stuck, contradicts itself, or hits a dead end without a controlled reset.

What it looks like in TradingAgents

The system starts building a trade thesis, encounters conflicting evidence, then produces a vague compromise instead of a clean resolution.

Check first

contradiction handling
conflict resolution logic
whether the workflow has a safe fallback when reasoning breaks

No.7 Memory breaks across sessions

What it means

Important context is lost, mixed, or silently overwritten across steps or sessions.

What it looks like in TradingAgents

Earlier assumptions about risk, event windows, or market context disappear in later stages, or stale prior-run context leaks into the new run.

Check first

session boundaries
memory reset behavior
whether prior summaries are intentionally reused or accidentally carried forward

No.8 Debugging as a black box

What it means

A bad result appears, but the failure path is not visible enough to inspect.

What it looks like in TradingAgents

The final answer is clearly wrong, but there is no easy way to see which role, retrieval step, or tool output caused the divergence.

Check first

intermediate role outputs
evidence traces
whether the pipeline preserves enough diagnostics to locate the break

No.9 Entropy collapse in outputs

What it means

The output becomes noisy, repetitive, unstable, or structurally incoherent.

What it looks like in TradingAgents

Analyst summaries repeat the same point, mix unrelated ideas, or lose clear structure as the run grows longer.

Check first

summarization compression
output formatting constraints
whether upstream noise is being propagated downstream

No.10 Creative freeze in analysis

What it means

The system becomes too literal and fails to synthesize when synthesis is needed.

What it looks like in TradingAgents

Instead of connecting catalysts, risk, timing, and market structure into a useful decision frame, the output stays flat and list-like.

Check first

whether the prompt asks only for summary
whether roles are allowed to synthesize across signals
whether the workflow over-optimizes for safe repetition

No.11 Symbolic collapse

What it means

The system breaks when asked to maintain structured, abstract, or logical relationships across variables.

What it looks like in TradingAgents

Position sizing logic, scenario trees, or conditional reasoning breaks when multiple variables must stay aligned.

Check first

whether abstract conditions are represented explicitly
whether the model is asked to hold too many implicit variables at once
whether structure is lost during natural-language compression

No.12 Philosophical recursion

What it means

The workflow falls into self-reference, circular evaluation, or paradox-like loops.

What it looks like in TradingAgents

One role justifies its output by citing another role that was itself derived from the first role’s assumptions, creating circular confidence.

Check first

circular dependencies between roles
whether evaluation is independent from generation
whether justification chains loop back to their own source

No.13 Multi-agent chaos

What it means

Multiple agents conflict, overwrite, or misalign without a clear reconciliation path.

What it looks like in TradingAgents

Bullish and bearish researchers disagree, the trader implicitly ignores one side, and the portfolio manager approves a recommendation without resolving the mismatch.

Check first

role boundaries
aggregation logic
whether disagreement is made explicit before the final decision

No.14 Bootstrap ordering failures

What it means

A component fires before the dependencies, tools, context, or setup it needs are ready.

What it looks like in TradingAgents

A role begins analysis before price data, news retrieval, or prior role outputs have completed or stabilized.

Check first

stage order
tool readiness
dependency availability at each workflow boundary

No.15 Deployment deadlock assumptions

What it means

The system silently depends on circular waits or incompatible stage assumptions.

What it looks like in TradingAgents

One stage expects validated context from another stage that is itself waiting for the first stage’s output.

Check first

dependency graph between stages
blocking assumptions
whether the orchestration logic forces circular waiting

No.16 Pre-deploy collapse

What it means

The first real run fails because setup assumptions were incomplete.

What it looks like in TradingAgents

A live-like or replay run breaks because of missing environment variables, version skew, incomplete tool configuration, or wrong first-call assumptions.

Check first

environment setup completeness
tool and dependency versions
first-run assumptions that were never validated in practice

Quick triage order

When a result looks wrong, this order is usually safer than prompt tweaking first:

verify ticker, symbol, timeframe, and market session
verify retrieval freshness and evidence relevance
verify role handoffs and disagreement handling
verify memory boundaries across runs
verify evidence-to-decision traceability
verify risk constraints and execution assumptions
only then tune prompts or output style

Scope note

This checklist is for diagnosing information flow, reasoning, coordination, and workflow failures in multi-agent trading systems.

It does not decide whether a strategy is profitable, whether a portfolio is optimal, or whether a market view is objectively correct.

Its purpose is to help teams find where a seemingly intelligent workflow is failing before they spend time fixing the wrong layer.

14 KiB Raw Blame History Unescape Escape

RAG Failure Checklist for Multi-Agent Trading Workflows

What this page is for

Quick links

Summary table

How to use this checklist

No.1 Data reality mismatch

No.2 Interpretation collapse

No.3 Long reasoning chain drift

No.4 Bluffing and overconfidence

No.5 Semantic mismatch in retrieval

No.6 Logic collapse and recovery failure

No.7 Memory breaks across sessions

No.8 Debugging as a black box

No.9 Entropy collapse in outputs

No.10 Creative freeze in analysis

No.11 Symbolic collapse

No.12 Philosophical recursion

No.13 Multi-agent chaos

No.14 Bootstrap ordering failures

No.15 Deployment deadlock assumptions

No.16 Pre-deploy collapse

Quick triage order

Scope note

14 KiB

Raw Blame History