7.3 KiB

Raw Blame History

TradingAgents backend migration and rollback notes draft

Status: draft Audience: backend/application maintainers Scope: migrate toward application-service boundary and result-contract-v1alpha1 with rollback safety

Current progress snapshot (2026-04)

Mainline has moved beyond pure planning, but it has not finished the full boundary migration:

Phase 0 is effectively done: contract and architecture drafts exist.
Phase 1-4 are partially landed:
- backend services now project v1alpha1-style public payloads;
- result contracts are persisted via result_store.py;
- /ws/analysis/{task_id} and /ws/orchestrator already wrap payloads with contract_version;
- recommendation and task-status reads already depend on application-layer shaping more than route-local reconstruction.
Phase 5 is not complete:
- web_dashboard/backend/main.py is still too large;
- route-local orchestration has not been fully deleted;
- compatibility fields still coexist with the newer contract-first path.

Also note that research provenance / node guard / profiling work is now landed on the orchestrator side. That effort complements the backend migration but should not be confused with “application boundary fully complete.”

1. Migration objective

Move backend delivery code from route-local orchestration to an application-service layer without changing the quant+LLM merge kernel behavior.

Target outcomes:

stable result contract (v1alpha1)
thin FastAPI transport
application-owned task lifecycle and mapping
rollback-safe migration using dual-read/dual-write where useful

2. Current coupling hotspots

Primary hotspot: web_dashboard/backend/main.py

It currently combines:

route handlers
task persistence
subprocess creation and monitoring
progress/stage state mutation
result projection into API fields
report export concerns

This file is the first migration target.

3. Recommended migration sequence

Phase 0: contract freeze draft

Deliverables:

agree on docs/contracts/result-contract-v1alpha1.md
agree on application boundary in docs/architecture/application-boundary.md

Rollback:

none needed; documentation only

Phase 1: introduce application service behind existing routes

Actions:

add backend application modules for analysis status, live signals, and report reads
keep existing route URLs unchanged
move mapping logic out of route functions into service/mappers

Compatibility tactic:

routes still return current payload shape if frontend depends on it
internal service also emits v1alpha1 DTOs for verification comparison

Rollback:

route handlers can call old inline functions directly via feature flag or import switch

Current status:

partially complete on mainline via analysis_service.py, job_service.py, and result_store.py
not complete enough yet to claim main.py is only a thin adapter

Phase 2: dual-read for task status

Why:

Task status currently lives in memory plus data/task_status/*.json. During migration, new service storage and old persisted shape may diverge.

Recommended strategy:

read preference: new application store first
fallback read: legacy JSON task status
compare key fields during shadow period: status, progress, current_stage, decision, error

Rollback:

switch read preference back to legacy JSON only
leave new store populated for debugging, but non-authoritative

Phase 3: dual-write for task results

Why:

To avoid breaking status pages and historical tooling during rollout.

Recommended strategy:

authoritative write: new application store
compatibility write: legacy app.state.task_results + data/task_status/*.json
emit diff logs when new-vs-legacy projections disagree

Guardrails:

dual-write only for application-layer payloads
do not dual-write alternate domain semantics into orchestrator/

Rollback:

disable new-store writes
continue legacy writes only

Phase 4: websocket and live signal migration

Actions:

make /ws/analysis/{task_id} and /ws/orchestrator render application contracts
keep websocket wrapper fields stable while migrating internal body shape

Suggested compatibility step:

send legacy event envelope with embedded contract_version
update frontend consumers before removing legacy-only fields

Rollback:

restore websocket serializer to legacy shape
keep application service intact behind adapter

Current status:

partially complete on mainline
/ws/orchestrator already emits contract_version, data_quality, degradation, and research
/ws/analysis/{task_id} already reads application-shaped task state

Phase 5: remove route-local orchestration

Actions:

delete dead inline task mutation helpers from main.py
keep routes as thin adapter layer
preserve report retrieval behavior

Rollback:

only safe after shadow metrics show parity
otherwise revert to Phase 3 dual-write mode, not direct deletion

4. Suggested feature flags

Environment-variable style examples:

TA_APP_SERVICE_ENABLED=1
TA_RESULT_CONTRACT_VERSION=v1alpha1
TA_TASKSTORE_DUAL_READ=1
TA_TASKSTORE_DUAL_WRITE=1
TA_WS_V1ALPHA1_ENABLED=0

These names are placeholders; exact naming can be chosen during implementation.

5. Verification checkpoints per phase

For each migration phase, verify:

same task ids are returned for the same route behavior
stage transitions remain monotonic
completed tasks persist decision, confidence, and degraded-path outcomes
failure path still preserves actionable error text
live websocket payloads preserve ticker/date ordering expectations

6. Rollback triggers

Rollback immediately if any of these happen:

task status disappears after backend restart
WebSocket clients stop receiving progress updates
completed analysis loses decision or confidence fields
degraded single-lane signals are reclassified incorrectly
report export or historical report retrieval cannot find prior artifacts

7. Explicit non-goals during migration

do not rewrite orchestrator/signals.py merge math as part of boundary migration
do not rework provider/model selection semantics in the same change set
do not force frontend redesign before contract shadowing proves parity
do not implement a new strategy layer inside the application service

8. Minimal rollback playbook

If production or local verification fails after migration cutover:

disable application-service read path
disable dual-write to new store if it corrupts parity checks
restore legacy route-local serializers
keep generated comparison logs/artifacts for diff analysis
re-run backend tests and one end-to-end manual analysis flow

9. Review checklist

A migration plan is acceptable only if it:

preserves orchestrator ownership of quant+LLM merge semantics
introduces feature-flagged cutover points
supports dual-read/dual-write only at application/persistence boundary
provides a one-step rollback path at each release phase

10. Maintainer note

When updating migration status, keep these three documents aligned:

docs/architecture/application-boundary.md
docs/contracts/result-contract-v1alpha1.md
docs/architecture/research-provenance.md

The first two describe backend/application convergence; the third describes orchestrator-side research degradation and profiling semantics that now feed those contracts.

7.3 KiB Raw Blame History

TradingAgents backend migration and rollback notes draft

Current progress snapshot (2026-04)

1. Migration objective

2. Current coupling hotspots

3. Recommended migration sequence

Phase 0: contract freeze draft

Phase 1: introduce application service behind existing routes

Phase 2: dual-read for task status

Phase 3: dual-write for task results

Phase 4: websocket and live signal migration

Phase 5: remove route-local orchestration

4. Suggested feature flags

5. Verification checkpoints per phase

6. Rollback triggers

7. Explicit non-goals during migration

8. Minimal rollback playbook

9. Review checklist

10. Maintainer note

7.3 KiB

Raw Blame History