14 KiB
RAG Failure Checklist for Multi-Agent Trading Workflows
A compact debugging checklist for multi-agent trading systems that use LLMs, retrieval, tools, role-based analysts, and staged decision handoffs.
This page is designed for cases where the system sounds fluent, but the output is clearly wrong, inconsistent, fragile, or difficult to trust.
What this page is for
Use this checklist when you see symptoms like:
- analysts disagreeing but the final decision does not explain why one side won
- a trade recommendation that cannot be traced back to clear evidence
- stale context leaking into a new run
- a polished explanation built on weak or partial support
- role handoffs that lose constraints, caveats, or key market assumptions
This page is not a strategy guide, a backtest framework, or a portfolio construction manual.
Its purpose is simpler:
- improve the first diagnostic cut
- reduce wasted debugging cycles
- help identify which layer is failing before editing prompts blindly
Quick links
- No.1 Data reality mismatch
- No.2 Interpretation collapse
- No.3 Long reasoning chain drift
- No.4 Bluffing and overconfidence
- No.5 Semantic mismatch in retrieval
- No.6 Logic collapse and recovery failure
- No.7 Memory breaks across sessions
- No.8 Debugging as a black box
- No.9 Entropy collapse in outputs
- No.10 Creative freeze in analysis
- No.11 Symbolic collapse
- No.12 Philosophical recursion
- No.13 Multi-agent chaos
- No.14 Bootstrap ordering failures
- No.15 Deployment deadlock assumptions
- No.16 Pre-deploy collapse
Summary table
| # | problem domain | what breaks | quick link |
|---|---|---|---|
| 1 | Input reality mismatch | retrieval brings the wrong asset, wrong regime, or wrong context | No.1 |
| 2 | Interpretation collapse | the retrieved material is relevant, but the logic built on top of it is wrong | No.2 |
| 3 | Long reasoning chain drift | multi-step analysis slowly shifts away from the original task | No.3 |
| 4 | Bluffing and overconfidence | the model sounds certain without enough support | No.4 |
| 5 | Semantic mismatch in retrieval | surface similarity replaces true relevance | No.5 |
| 6 | Logic collapse and recovery failure | the reasoning path hits a dead end but does not recover safely | No.6 |
| 7 | Memory breaks across sessions | earlier context or assumptions are lost or mixed | No.7 |
| 8 | Debugging as a black box | a bad output appears, but the failure path is not visible | No.8 |
| 9 | Entropy collapse in outputs | outputs become noisy, repetitive, or structurally incoherent | No.9 |
| 10 | Creative freeze in analysis | the system gives flat and literal analysis when synthesis is needed | No.10 |
| 11 | Symbolic collapse | abstract or logical prompts break under structured reasoning pressure | No.11 |
| 12 | Philosophical recursion | self-reference loops or paradox-like reasoning traps the workflow | No.12 |
| 13 | Multi-agent chaos | agent roles overwrite, conflict, or misalign without resolution | No.13 |
| 14 | Bootstrap ordering failures | components act before dependencies or context are ready | No.14 |
| 15 | Deployment deadlock assumptions | system stages wait on each other in the wrong order | No.15 |
| 16 | Pre-deploy collapse | version skew, missing secrets, or incomplete setup breaks the first live run | No.16 |
How to use this checklist
When a run goes wrong, do not start by rewriting prompts immediately.
Use this order first:
- verify that the input reality is correct
- verify that the retrieved evidence is fresh and relevant
- verify that role handoffs preserve assumptions and constraints
- verify that disagreement is resolved explicitly
- verify that the final answer can be traced back to evidence
- only then tune prompts, wording, or style
Multiple failure patterns may appear at the same time. The goal is not to force everything into one box. The goal is to avoid the wrong first cut.
No.1 Data reality mismatch
What it means
The system is grounded in the wrong asset, wrong timeframe, wrong market session, or wrong event background.
What it looks like in TradingAgents
An analyst discusses a catalyst, earnings event, or price regime that does not match the ticker or time horizon of the current run.
Check first
- ticker and instrument mapping
- date range and session boundaries
- alignment between retrieved inputs and the actual task
No.2 Interpretation collapse
What it means
The retrieved material is relevant, but the interpretation built on top of it is wrong.
What it looks like in TradingAgents
The system retrieves the right earnings, macro, or market data, but draws the wrong conclusion from it.
Check first
- whether the evidence actually supports the conclusion
- whether the role is summarizing versus inferring
- whether the reasoning step compresses away key conditions
No.3 Long reasoning chain drift
What it means
A multi-step workflow slowly shifts away from the original task.
What it looks like in TradingAgents
A short-horizon trading task turns into a general company summary, or a portfolio-level question gets treated like a single-name research memo.
Check first
- task framing for each role
- time horizon consistency
- handoff wording between planner, analysts, trader, and portfolio manager
No.4 Bluffing and overconfidence
What it means
The system sounds more certain than the evidence allows.
What it looks like in TradingAgents
A strong buy or sell recommendation appears even when the evidence is mixed, thin, or partly generic.
Check first
- confidence language
- minimum evidence threshold
- whether uncertainty can be expressed without collapsing into vague text
No.5 Semantic mismatch in retrieval
What it means
The retrieval step returns content that looks similar on the surface but is not truly relevant.
What it looks like in TradingAgents
The run pulls related sector commentary, ETF discussion, or nearby company context instead of the most decision-critical evidence for the current symbol.
Check first
- retrieval filters
- ranking logic
- symbol disambiguation and query formulation
No.6 Logic collapse and recovery failure
What it means
The reasoning path gets stuck, contradicts itself, or hits a dead end without a controlled reset.
What it looks like in TradingAgents
The system starts building a trade thesis, encounters conflicting evidence, then produces a vague compromise instead of a clean resolution.
Check first
- contradiction handling
- conflict resolution logic
- whether the workflow has a safe fallback when reasoning breaks
No.7 Memory breaks across sessions
What it means
Important context is lost, mixed, or silently overwritten across steps or sessions.
What it looks like in TradingAgents
Earlier assumptions about risk, event windows, or market context disappear in later stages, or stale prior-run context leaks into the new run.
Check first
- session boundaries
- memory reset behavior
- whether prior summaries are intentionally reused or accidentally carried forward
No.8 Debugging as a black box
What it means
A bad result appears, but the failure path is not visible enough to inspect.
What it looks like in TradingAgents
The final answer is clearly wrong, but there is no easy way to see which role, retrieval step, or tool output caused the divergence.
Check first
- intermediate role outputs
- evidence traces
- whether the pipeline preserves enough diagnostics to locate the break
No.9 Entropy collapse in outputs
What it means
The output becomes noisy, repetitive, unstable, or structurally incoherent.
What it looks like in TradingAgents
Analyst summaries repeat the same point, mix unrelated ideas, or lose clear structure as the run grows longer.
Check first
- summarization compression
- output formatting constraints
- whether upstream noise is being propagated downstream
No.10 Creative freeze in analysis
What it means
The system becomes too literal and fails to synthesize when synthesis is needed.
What it looks like in TradingAgents
Instead of connecting catalysts, risk, timing, and market structure into a useful decision frame, the output stays flat and list-like.
Check first
- whether the prompt asks only for summary
- whether roles are allowed to synthesize across signals
- whether the workflow over-optimizes for safe repetition
No.11 Symbolic collapse
What it means
The system breaks when asked to maintain structured, abstract, or logical relationships across variables.
What it looks like in TradingAgents
Position sizing logic, scenario trees, or conditional reasoning breaks when multiple variables must stay aligned.
Check first
- whether abstract conditions are represented explicitly
- whether the model is asked to hold too many implicit variables at once
- whether structure is lost during natural-language compression
No.12 Philosophical recursion
What it means
The workflow falls into self-reference, circular evaluation, or paradox-like loops.
What it looks like in TradingAgents
One role justifies its output by citing another role that was itself derived from the first role’s assumptions, creating circular confidence.
Check first
- circular dependencies between roles
- whether evaluation is independent from generation
- whether justification chains loop back to their own source
No.13 Multi-agent chaos
What it means
Multiple agents conflict, overwrite, or misalign without a clear reconciliation path.
What it looks like in TradingAgents
Bullish and bearish researchers disagree, the trader implicitly ignores one side, and the portfolio manager approves a recommendation without resolving the mismatch.
Check first
- role boundaries
- aggregation logic
- whether disagreement is made explicit before the final decision
No.14 Bootstrap ordering failures
What it means
A component fires before the dependencies, tools, context, or setup it needs are ready.
What it looks like in TradingAgents
A role begins analysis before price data, news retrieval, or prior role outputs have completed or stabilized.
Check first
- stage order
- tool readiness
- dependency availability at each workflow boundary
No.15 Deployment deadlock assumptions
What it means
The system silently depends on circular waits or incompatible stage assumptions.
What it looks like in TradingAgents
One stage expects validated context from another stage that is itself waiting for the first stage’s output.
Check first
- dependency graph between stages
- blocking assumptions
- whether the orchestration logic forces circular waiting
No.16 Pre-deploy collapse
What it means
The first real run fails because setup assumptions were incomplete.
What it looks like in TradingAgents
A live-like or replay run breaks because of missing environment variables, version skew, incomplete tool configuration, or wrong first-call assumptions.
Check first
- environment setup completeness
- tool and dependency versions
- first-run assumptions that were never validated in practice
Quick triage order
When a result looks wrong, this order is usually safer than prompt tweaking first:
- verify ticker, symbol, timeframe, and market session
- verify retrieval freshness and evidence relevance
- verify role handoffs and disagreement handling
- verify memory boundaries across runs
- verify evidence-to-decision traceability
- verify risk constraints and execution assumptions
- only then tune prompts or output style
Scope note
This checklist is for diagnosing information flow, reasoning, coordination, and workflow failures in multi-agent trading systems.
It does not decide whether a strategy is profitable, whether a portfolio is optimal, or whether a market view is objectively correct.
Its purpose is to help teams find where a seemingly intelligent workflow is failing before they spend time fixing the wrong layer.