7.3 KiB
TradingAgents backend migration and rollback notes draft
Status: draft Audience: backend/application maintainers Scope: migrate toward application-service boundary and result-contract-v1alpha1 with rollback safety
Current progress snapshot (2026-04)
Mainline has moved beyond pure planning, but it has not finished the full boundary migration:
Phase 0is effectively done: contract and architecture drafts exist.Phase 1-4are partially landed:- backend services now project
v1alpha1-style public payloads; - result contracts are persisted via
result_store.py; /ws/analysis/{task_id}and/ws/orchestratoralready wrap payloads withcontract_version;- recommendation and task-status reads already depend on application-layer shaping more than route-local reconstruction.
- backend services now project
Phase 5is not complete:web_dashboard/backend/main.pyis still too large;- route-local orchestration has not been fully deleted;
- compatibility fields still coexist with the newer contract-first path.
Also note that research provenance / node guard / profiling work is now landed on the orchestrator side. That effort complements the backend migration but should not be confused with “application boundary fully complete.”
1. Migration objective
Move backend delivery code from route-local orchestration to an application-service layer without changing the quant+LLM merge kernel behavior.
Target outcomes:
- stable result contract (
v1alpha1) - thin FastAPI transport
- application-owned task lifecycle and mapping
- rollback-safe migration using dual-read/dual-write where useful
2. Current coupling hotspots
Primary hotspot: web_dashboard/backend/main.py
It currently combines:
- route handlers
- task persistence
- subprocess creation and monitoring
- progress/stage state mutation
- result projection into API fields
- report export concerns
This file is the first migration target.
3. Recommended migration sequence
Phase 0: contract freeze draft
Deliverables:
- agree on
docs/contracts/result-contract-v1alpha1.md - agree on application boundary in
docs/architecture/application-boundary.md
Rollback:
- none needed; documentation only
Phase 1: introduce application service behind existing routes
Actions:
- add backend application modules for analysis status, live signals, and report reads
- keep existing route URLs unchanged
- move mapping logic out of route functions into service/mappers
Compatibility tactic:
- routes still return current payload shape if frontend depends on it
- internal service also emits
v1alpha1DTOs for verification comparison
Rollback:
- route handlers can call old inline functions directly via feature flag or import switch
Current status:
- partially complete on mainline via
analysis_service.py,job_service.py, andresult_store.py - not complete enough yet to claim
main.pyis only a thin adapter
Phase 2: dual-read for task status
Why:
Task status currently lives in memory plus data/task_status/*.json. During migration, new service storage and old persisted shape may diverge.
Recommended strategy:
- read preference: new application store first
- fallback read: legacy JSON task status
- compare key fields during shadow period:
status,progress,current_stage,decision,error
Rollback:
- switch read preference back to legacy JSON only
- leave new store populated for debugging, but non-authoritative
Phase 3: dual-write for task results
Why:
To avoid breaking status pages and historical tooling during rollout.
Recommended strategy:
- authoritative write: new application store
- compatibility write: legacy
app.state.task_results+data/task_status/*.json - emit diff logs when new-vs-legacy projections disagree
Guardrails:
- dual-write only for application-layer payloads
- do not dual-write alternate domain semantics into
orchestrator/
Rollback:
- disable new-store writes
- continue legacy writes only
Phase 4: websocket and live signal migration
Actions:
- make
/ws/analysis/{task_id}and/ws/orchestratorrender application contracts - keep websocket wrapper fields stable while migrating internal body shape
Suggested compatibility step:
- send legacy event envelope with embedded
contract_version - update frontend consumers before removing legacy-only fields
Rollback:
- restore websocket serializer to legacy shape
- keep application service intact behind adapter
Current status:
- partially complete on mainline
/ws/orchestratoralready emitscontract_version,data_quality,degradation, andresearch/ws/analysis/{task_id}already reads application-shaped task state
Phase 5: remove route-local orchestration
Actions:
- delete dead inline task mutation helpers from
main.py - keep routes as thin adapter layer
- preserve report retrieval behavior
Rollback:
- only safe after shadow metrics show parity
- otherwise revert to Phase 3 dual-write mode, not direct deletion
4. Suggested feature flags
Environment-variable style examples:
TA_APP_SERVICE_ENABLED=1TA_RESULT_CONTRACT_VERSION=v1alpha1TA_TASKSTORE_DUAL_READ=1TA_TASKSTORE_DUAL_WRITE=1TA_WS_V1ALPHA1_ENABLED=0
These names are placeholders; exact naming can be chosen during implementation.
5. Verification checkpoints per phase
For each migration phase, verify:
- same task ids are returned for the same route behavior
- stage transitions remain monotonic
- completed tasks persist
decision,confidence, and degraded-path outcomes - failure path still preserves actionable error text
- live websocket payloads preserve ticker/date ordering expectations
6. Rollback triggers
Rollback immediately if any of these happen:
- task status disappears after backend restart
- WebSocket clients stop receiving progress updates
- completed analysis loses
decisionor confidence fields - degraded single-lane signals are reclassified incorrectly
- report export or historical report retrieval cannot find prior artifacts
7. Explicit non-goals during migration
- do not rewrite
orchestrator/signals.pymerge math as part of boundary migration - do not rework provider/model selection semantics in the same change set
- do not force frontend redesign before contract shadowing proves parity
- do not implement a new strategy layer inside the application service
8. Minimal rollback playbook
If production or local verification fails after migration cutover:
- disable application-service read path
- disable dual-write to new store if it corrupts parity checks
- restore legacy route-local serializers
- keep generated comparison logs/artifacts for diff analysis
- re-run backend tests and one end-to-end manual analysis flow
9. Review checklist
A migration plan is acceptable only if it:
- preserves orchestrator ownership of quant+LLM merge semantics
- introduces feature-flagged cutover points
- supports dual-read/dual-write only at application/persistence boundary
- provides a one-step rollback path at each release phase
10. Maintainer note
When updating migration status, keep these three documents aligned:
docs/architecture/application-boundary.mddocs/contracts/result-contract-v1alpha1.mddocs/architecture/research-provenance.md
The first two describe backend/application convergence; the third describes orchestrator-side research degradation and profiling semantics that now feed those contracts.