open-multi-agent

Commit Graph

Author	SHA1	Message	Date
JackChen	4d7564b71a	feat: task-level retry with exponential backoff (#37 ) * feat: add task-level retry with exponential backoff Add `maxRetries`, `retryDelayMs`, and `retryBackoff` to task config. When a task fails and retries remain, the orchestrator waits with exponential backoff and re-runs the task with a fresh agent conversation. A `task_retry` event is emitted via `onProgress` for observability. Cascade failure only occurs after all retries are exhausted. Closes #30 * fix: address review — extract executeWithRetry, add delay cap, fix tests - Extract `executeWithRetry()` as a testable exported function - Add `computeRetryDelay()` with 30s max cap (prevents runaway backoff) - Remove retry fields from `ParsedTaskSpec` (dead code for runTeam path) - Deduplicate retry event emission (single code path for both error types) - Injectable delay function for test determinism - Rewrite tests to call the real `executeWithRetry`, not a copy - 15 tests covering: success, retry+success, retry+failure, backoff calculation, delay cap, delay function injection, no-retry default * fix: clamp negative maxRetries/retryBackoff to safe values - maxRetries clamped to >= 0 (negative values treated as no retry) - retryBackoff clamped to >= 1 (prevents zero/negative delay oscillation) - retryDelayMs clamped to >= 0 - Add tests for negative maxRetries and negative backoff Addresses Codex review P1 on #37 * fix: accumulate token usage across retry attempts Previously only the final attempt's tokenUsage was returned, causing under-reporting of actual model consumption when retries occurred. Now all attempts' token counts are summed in the returned result. Addresses Codex review P2 (token usage) on #37	2026-04-03 14:08:36 +08:00

Author

SHA1

Message

Date

JackChen

4d7564b71a

feat: task-level retry with exponential backoff (#37 )

* feat: add task-level retry with exponential backoff

Add `maxRetries`, `retryDelayMs`, and `retryBackoff` to task config.
When a task fails and retries remain, the orchestrator waits with
exponential backoff and re-runs the task with a fresh agent conversation.
A `task_retry` event is emitted via `onProgress` for observability.
Cascade failure only occurs after all retries are exhausted.

Closes #30

* fix: address review — extract executeWithRetry, add delay cap, fix tests

- Extract `executeWithRetry()` as a testable exported function
- Add `computeRetryDelay()` with 30s max cap (prevents runaway backoff)
- Remove retry fields from `ParsedTaskSpec` (dead code for runTeam path)
- Deduplicate retry event emission (single code path for both error types)
- Injectable delay function for test determinism
- Rewrite tests to call the real `executeWithRetry`, not a copy
- 15 tests covering: success, retry+success, retry+failure, backoff
  calculation, delay cap, delay function injection, no-retry default

* fix: clamp negative maxRetries/retryBackoff to safe values

- maxRetries clamped to >= 0 (negative values treated as no retry)
- retryBackoff clamped to >= 1 (prevents zero/negative delay oscillation)
- retryDelayMs clamped to >= 0
- Add tests for negative maxRetries and negative backoff

Addresses Codex review P1 on #37

* fix: accumulate token usage across retry attempts

Previously only the final attempt's tokenUsage was returned, causing
under-reporting of actual model consumption when retries occurred.
Now all attempts' token counts are summed in the returned result.

Addresses Codex review P2 (token usage) on #37

2026-04-03 14:08:36 +08:00

1 Commits