docs: add RAG failure checklist for multi-agent trading workflows

This document serves as a comprehensive checklist for diagnosing issues in multi-agent trading workflows that utilize LLMs and other tools. It outlines various failure patterns and provides a structured approach for troubleshooting.
This commit is contained in:
PSBigBig + MiniPS 2026-03-16 09:51:47 +08:00 committed by GitHub
parent f362a160c3
commit 4d98d2ce85
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 402 additions and 0 deletions

View File

@ -0,0 +1,402 @@
# RAG Failure Checklist for Multi-Agent Trading Workflows
A compact debugging checklist for multi-agent trading systems that use LLMs, retrieval, tools, role-based analysts, and staged decision handoffs.
This page is designed for cases where the system sounds fluent, but the output is clearly wrong, inconsistent, fragile, or difficult to trust.
## What this page is for
Use this checklist when you see symptoms like:
- analysts disagreeing but the final decision does not explain why one side won
- a trade recommendation that cannot be traced back to clear evidence
- stale context leaking into a new run
- a polished explanation built on weak or partial support
- role handoffs that lose constraints, caveats, or key market assumptions
This page is not a strategy guide, a backtest framework, or a portfolio construction manual.
Its purpose is simpler:
- improve the first diagnostic cut
- reduce wasted debugging cycles
- help identify which layer is failing before editing prompts blindly
## Quick links
- [No.1 Data reality mismatch](#no1-data-reality-mismatch)
- [No.2 Interpretation collapse](#no2-interpretation-collapse)
- [No.3 Long reasoning chain drift](#no3-long-reasoning-chain-drift)
- [No.4 Bluffing and overconfidence](#no4-bluffing-and-overconfidence)
- [No.5 Semantic mismatch in retrieval](#no5-semantic-mismatch-in-retrieval)
- [No.6 Logic collapse and recovery failure](#no6-logic-collapse-and-recovery-failure)
- [No.7 Memory breaks across sessions](#no7-memory-breaks-across-sessions)
- [No.8 Debugging as a black box](#no8-debugging-as-a-black-box)
- [No.9 Entropy collapse in outputs](#no9-entropy-collapse-in-outputs)
- [No.10 Creative freeze in analysis](#no10-creative-freeze-in-analysis)
- [No.11 Symbolic collapse](#no11-symbolic-collapse)
- [No.12 Philosophical recursion](#no12-philosophical-recursion)
- [No.13 Multi-agent chaos](#no13-multi-agent-chaos)
- [No.14 Bootstrap ordering failures](#no14-bootstrap-ordering-failures)
- [No.15 Deployment deadlock assumptions](#no15-deployment-deadlock-assumptions)
- [No.16 Pre-deploy collapse](#no16-pre-deploy-collapse)
## Summary table
| # | problem domain | what breaks | quick link |
|---|---|---|---|
| 1 | Input reality mismatch | retrieval brings the wrong asset, wrong regime, or wrong context | [No.1](#no1-data-reality-mismatch) |
| 2 | Interpretation collapse | the retrieved material is relevant, but the logic built on top of it is wrong | [No.2](#no2-interpretation-collapse) |
| 3 | Long reasoning chain drift | multi-step analysis slowly shifts away from the original task | [No.3](#no3-long-reasoning-chain-drift) |
| 4 | Bluffing and overconfidence | the model sounds certain without enough support | [No.4](#no4-bluffing-and-overconfidence) |
| 5 | Semantic mismatch in retrieval | surface similarity replaces true relevance | [No.5](#no5-semantic-mismatch-in-retrieval) |
| 6 | Logic collapse and recovery failure | the reasoning path hits a dead end but does not recover safely | [No.6](#no6-logic-collapse-and-recovery-failure) |
| 7 | Memory breaks across sessions | earlier context or assumptions are lost or mixed | [No.7](#no7-memory-breaks-across-sessions) |
| 8 | Debugging as a black box | a bad output appears, but the failure path is not visible | [No.8](#no8-debugging-as-a-black-box) |
| 9 | Entropy collapse in outputs | outputs become noisy, repetitive, or structurally incoherent | [No.9](#no9-entropy-collapse-in-outputs) |
| 10 | Creative freeze in analysis | the system gives flat and literal analysis when synthesis is needed | [No.10](#no10-creative-freeze-in-analysis) |
| 11 | Symbolic collapse | abstract or logical prompts break under structured reasoning pressure | [No.11](#no11-symbolic-collapse) |
| 12 | Philosophical recursion | self-reference loops or paradox-like reasoning traps the workflow | [No.12](#no12-philosophical-recursion) |
| 13 | Multi-agent chaos | agent roles overwrite, conflict, or misalign without resolution | [No.13](#no13-multi-agent-chaos) |
| 14 | Bootstrap ordering failures | components act before dependencies or context are ready | [No.14](#no14-bootstrap-ordering-failures) |
| 15 | Deployment deadlock assumptions | system stages wait on each other in the wrong order | [No.15](#no15-deployment-deadlock-assumptions) |
| 16 | Pre-deploy collapse | version skew, missing secrets, or incomplete setup breaks the first live run | [No.16](#no16-pre-deploy-collapse) |
## How to use this checklist
When a run goes wrong, do not start by rewriting prompts immediately.
Use this order first:
1. verify that the input reality is correct
2. verify that the retrieved evidence is fresh and relevant
3. verify that role handoffs preserve assumptions and constraints
4. verify that disagreement is resolved explicitly
5. verify that the final answer can be traced back to evidence
6. only then tune prompts, wording, or style
Multiple failure patterns may appear at the same time. The goal is not to force everything into one box. The goal is to avoid the wrong first cut.
---
<a id="no1-data-reality-mismatch"></a>
## No.1 Data reality mismatch
**What it means**
The system is grounded in the wrong asset, wrong timeframe, wrong market session, or wrong event background.
**What it looks like in TradingAgents**
An analyst discusses a catalyst, earnings event, or price regime that does not match the ticker or time horizon of the current run.
**Check first**
- ticker and instrument mapping
- date range and session boundaries
- alignment between retrieved inputs and the actual task
---
<a id="no2-interpretation-collapse"></a>
## No.2 Interpretation collapse
**What it means**
The retrieved material is relevant, but the interpretation built on top of it is wrong.
**What it looks like in TradingAgents**
The system retrieves the right earnings, macro, or market data, but draws the wrong conclusion from it.
**Check first**
- whether the evidence actually supports the conclusion
- whether the role is summarizing versus inferring
- whether the reasoning step compresses away key conditions
---
<a id="no3-long-reasoning-chain-drift"></a>
## No.3 Long reasoning chain drift
**What it means**
A multi-step workflow slowly shifts away from the original task.
**What it looks like in TradingAgents**
A short-horizon trading task turns into a general company summary, or a portfolio-level question gets treated like a single-name research memo.
**Check first**
- task framing for each role
- time horizon consistency
- handoff wording between planner, analysts, trader, and portfolio manager
---
<a id="no4-bluffing-and-overconfidence"></a>
## No.4 Bluffing and overconfidence
**What it means**
The system sounds more certain than the evidence allows.
**What it looks like in TradingAgents**
A strong buy or sell recommendation appears even when the evidence is mixed, thin, or partly generic.
**Check first**
- confidence language
- minimum evidence threshold
- whether uncertainty can be expressed without collapsing into vague text
---
<a id="no5-semantic-mismatch-in-retrieval"></a>
## No.5 Semantic mismatch in retrieval
**What it means**
The retrieval step returns content that looks similar on the surface but is not truly relevant.
**What it looks like in TradingAgents**
The run pulls related sector commentary, ETF discussion, or nearby company context instead of the most decision-critical evidence for the current symbol.
**Check first**
- retrieval filters
- ranking logic
- symbol disambiguation and query formulation
---
<a id="no6-logic-collapse-and-recovery-failure"></a>
## No.6 Logic collapse and recovery failure
**What it means**
The reasoning path gets stuck, contradicts itself, or hits a dead end without a controlled reset.
**What it looks like in TradingAgents**
The system starts building a trade thesis, encounters conflicting evidence, then produces a vague compromise instead of a clean resolution.
**Check first**
- contradiction handling
- conflict resolution logic
- whether the workflow has a safe fallback when reasoning breaks
---
<a id="no7-memory-breaks-across-sessions"></a>
## No.7 Memory breaks across sessions
**What it means**
Important context is lost, mixed, or silently overwritten across steps or sessions.
**What it looks like in TradingAgents**
Earlier assumptions about risk, event windows, or market context disappear in later stages, or stale prior-run context leaks into the new run.
**Check first**
- session boundaries
- memory reset behavior
- whether prior summaries are intentionally reused or accidentally carried forward
---
<a id="no8-debugging-as-a-black-box"></a>
## No.8 Debugging as a black box
**What it means**
A bad result appears, but the failure path is not visible enough to inspect.
**What it looks like in TradingAgents**
The final answer is clearly wrong, but there is no easy way to see which role, retrieval step, or tool output caused the divergence.
**Check first**
- intermediate role outputs
- evidence traces
- whether the pipeline preserves enough diagnostics to locate the break
---
<a id="no9-entropy-collapse-in-outputs"></a>
## No.9 Entropy collapse in outputs
**What it means**
The output becomes noisy, repetitive, unstable, or structurally incoherent.
**What it looks like in TradingAgents**
Analyst summaries repeat the same point, mix unrelated ideas, or lose clear structure as the run grows longer.
**Check first**
- summarization compression
- output formatting constraints
- whether upstream noise is being propagated downstream
---
<a id="no10-creative-freeze-in-analysis"></a>
## No.10 Creative freeze in analysis
**What it means**
The system becomes too literal and fails to synthesize when synthesis is needed.
**What it looks like in TradingAgents**
Instead of connecting catalysts, risk, timing, and market structure into a useful decision frame, the output stays flat and list-like.
**Check first**
- whether the prompt asks only for summary
- whether roles are allowed to synthesize across signals
- whether the workflow over-optimizes for safe repetition
---
<a id="no11-symbolic-collapse"></a>
## No.11 Symbolic collapse
**What it means**
The system breaks when asked to maintain structured, abstract, or logical relationships across variables.
**What it looks like in TradingAgents**
Position sizing logic, scenario trees, or conditional reasoning breaks when multiple variables must stay aligned.
**Check first**
- whether abstract conditions are represented explicitly
- whether the model is asked to hold too many implicit variables at once
- whether structure is lost during natural-language compression
---
<a id="no12-philosophical-recursion"></a>
## No.12 Philosophical recursion
**What it means**
The workflow falls into self-reference, circular evaluation, or paradox-like loops.
**What it looks like in TradingAgents**
One role justifies its output by citing another role that was itself derived from the first roles assumptions, creating circular confidence.
**Check first**
- circular dependencies between roles
- whether evaluation is independent from generation
- whether justification chains loop back to their own source
---
<a id="no13-multi-agent-chaos"></a>
## No.13 Multi-agent chaos
**What it means**
Multiple agents conflict, overwrite, or misalign without a clear reconciliation path.
**What it looks like in TradingAgents**
Bullish and bearish researchers disagree, the trader implicitly ignores one side, and the portfolio manager approves a recommendation without resolving the mismatch.
**Check first**
- role boundaries
- aggregation logic
- whether disagreement is made explicit before the final decision
---
<a id="no14-bootstrap-ordering-failures"></a>
## No.14 Bootstrap ordering failures
**What it means**
A component fires before the dependencies, tools, context, or setup it needs are ready.
**What it looks like in TradingAgents**
A role begins analysis before price data, news retrieval, or prior role outputs have completed or stabilized.
**Check first**
- stage order
- tool readiness
- dependency availability at each workflow boundary
---
<a id="no15-deployment-deadlock-assumptions"></a>
## No.15 Deployment deadlock assumptions
**What it means**
The system silently depends on circular waits or incompatible stage assumptions.
**What it looks like in TradingAgents**
One stage expects validated context from another stage that is itself waiting for the first stages output.
**Check first**
- dependency graph between stages
- blocking assumptions
- whether the orchestration logic forces circular waiting
---
<a id="no16-pre-deploy-collapse"></a>
## No.16 Pre-deploy collapse
**What it means**
The first real run fails because setup assumptions were incomplete.
**What it looks like in TradingAgents**
A live-like or replay run breaks because of missing environment variables, version skew, incomplete tool configuration, or wrong first-call assumptions.
**Check first**
- environment setup completeness
- tool and dependency versions
- first-run assumptions that were never validated in practice
## Quick triage order
When a result looks wrong, this order is usually safer than prompt tweaking first:
1. verify ticker, symbol, timeframe, and market session
2. verify retrieval freshness and evidence relevance
3. verify role handoffs and disagreement handling
4. verify memory boundaries across runs
5. verify evidence-to-decision traceability
6. verify risk constraints and execution assumptions
7. only then tune prompts or output style
## Scope note
This checklist is for diagnosing information flow, reasoning, coordination, and workflow failures in multi-agent trading systems.
It does not decide whether a strategy is profitable, whether a portfolio is optimal, or whether a market view is objectively correct.
Its purpose is to help teams find where a seemingly intelligent workflow is failing before they spend time fixing the wrong layer.