- Add contract tests for Anthropic, OpenAI, Gemini, Copilot adapters
- Add optional E2E test suite (tests/e2e/, run with npm run test:e2e)
- Add shared test fixtures (tests/helpers/llm-fixtures.ts)
- Configure vitest to exclude e2e tests by default
- Add "files" field to package.json to reduce npm package size by 50%
- Align npm description with GitHub repo description
- Bump version to 1.0.1
* feat(agent): add smart loop detection for stuck agents (#16)
Detect when agents repeat the same tool calls or text outputs in a
sliding window. Three modes: warn (inject nudge, terminate on 2nd hit),
terminate (immediate stop), or custom callback. Fully opt-in via
`loopDetection` on AgentConfig — zero overhead when unconfigured.
* fix(agent): support async onLoopDetected callbacks and prevent orphaned tool_use events
- Await onLoopDetected callback result so async functions work correctly
instead of silently falling through to 'continue'
- Move loop detection before yielding tool_use events so terminate mode
never emits tool_use without a matching tool_result
* fix(agent): reset loopWarned on recovery and rename maxRepeatedToolCalls to maxRepetitions
- Reset loopWarned flag when the agent stops repeating, so a future
loop gets a fresh warning cycle instead of immediate termination
- Rename maxRepeatedToolCalls → maxRepetitions since the threshold
applies to both tool call and text output repetition detection
* test(agent): add tests for async callback, warn recovery, and injected warning text
- Verify async onLoopDetected callback is awaited correctly
- Verify loopWarned resets after recovery, giving fresh warning cycle
- Verify WARNING TextBlock is injected into user message content
- Add @google/genai to devDependencies so types are available for
lint/test in CI (stays as optional peerDependency for consumers)
- Delete package-lock.json in CI before npm install to avoid
Mac-generated lockfile missing Linux platform-specific rollup binaries
npm ci fails on Linux CI when package-lock.json was generated on macOS,
because platform-specific optional deps (@rollup/rollup-linux-x64-gnu)
are missing from the lockfile. This is a known npm bug (#4828).
When both timeoutMs and a caller-provided abortSignal were set, the
timeout signal silently replaced the caller's signal. Now they are
combined via mergeAbortSignals() so either source can cancel the run.
Also removes dead array-handling branch in text-tool-extractor.ts
(extractJSONObjects only returns objects, never arrays).
Local models (Ollama, vLLM) sometimes return tool calls as text instead
of using the native tool_calls wire format. This adds a safety-net
extractor that parses tool calls from model text output when native
tool_calls is empty.
- Add text-tool-extractor with support for bare JSON, code fences,
and Hermes <tool_call> tags
- Wire fallback into OpenAI adapter chat() and stream() paths
- Add onWarning callback when model ignores configured tools
- Add timeoutMs on AgentConfig for per-run abort (local models can
be slow)
- Add 26 tests for extractor and fallback behavior
- Document local model compatibility in README
* feat(orchestrator): add onApproval callback for human-in-the-loop (#32)
Add an optional `onApproval` callback to OrchestratorConfig that gates
between task execution rounds. After each batch of parallel tasks
completes, the callback receives the completed tasks and the tasks about
to start, returning true to continue or false to abort gracefully.
Key changes:
- Add 'skipped' to TaskStatus for user-initiated abort (distinct from 'failed')
- Add skip(), skipRemaining(), cascadeSkip() to TaskQueue
- Add 'task_skipped' to OrchestratorEvent for progress monitoring
- Approval gate in executeQueue() with try/catch for callback errors
- Synthesis prompt now includes skipped tasks section
- 17 new tests covering queue skip operations and orchestrator integration
Closes#32
* docs: clarify onApproval contract and add missing test scenarios
- Document skip() cascade semantics, skipRemaining() in-flight constraint,
and onApproval trigger conditions / mutation warning
- Add concurrency safety comment on completedThisRound
- Note task_skipped as breaking union addition on OrchestratorEvent
- Add 3 test scenarios: single-batch no-callback, mixed success/failure
batch, and onProgress task_skipped event relay
llama-server exposes an OpenAI-compatible API at /v1/chat/completions,
so it works with provider: 'openai' + baseURL like Ollama and vLLM.
Added to the supported providers table and the feature bullet.
Fixes#34
* feat(agent): add beforeRun / afterRun lifecycle hooks (#31)
Add optional hook callbacks to AgentConfig for cross-cutting concerns
(guardrails, logging, token budgets) without modifying framework internals.
- beforeRun: receives prompt + agent config, can modify or throw to abort
- afterRun: receives AgentRunResult, can modify or throw to fail
- Works with all three execution modes: run(), prompt(), stream()
- 15 test cases covering modify, throw, async, composition, and history integrity
* fix(agent): preserve non-text content blocks in beforeRun hook
- applyHookContext now replaces only text blocks, keeping images and
tool results intact (was silently stripping them)
- Use backward loop instead of reverse() + find() for efficiency
- Clarify JSDoc that only `prompt` is applied from hook return value
- Add test for mixed-content user messages
* fix(agent): address review feedback on beforeRun/afterRun hooks
- Normalize stream done event to always yield AgentRunResult
- Move transitionTo('completed') after afterRun to fix state ordering
- Strip hook functions from BeforeRunHookContext.agent to avoid self-references
- Pass originalPrompt to applyHookContext to avoid redundant message scan
- Clarify afterRun JSDoc: not called when the run throws
- Add tests: error-path skip, outputSchema+afterRun, ctx.agent shape, multi-turn hooks
Add lightweight onTrace callback to OrchestratorConfig that emits
structured span events (llm_call, tool_call, task, agent) with timing,
token usage, and runId correlation. Zero overhead when not subscribed.
Closes#18
- Merge examples 08 (runTasks) and 09 (runTeam) into a single Gemma 4 example
- Renumber: structured output → 09, task retry → 10
- Move Author and Contributors sections to bottom in both READMEs
- Add Author section to English README
Use Number.isFinite() to sanitize maxRetries, retryDelayMs, and
retryBackoff before entering the retry loop. Prevents unbounded
retries from Infinity or broken loop bounds from NaN.
* feat: add task-level retry with exponential backoff
Add `maxRetries`, `retryDelayMs`, and `retryBackoff` to task config.
When a task fails and retries remain, the orchestrator waits with
exponential backoff and re-runs the task with a fresh agent conversation.
A `task_retry` event is emitted via `onProgress` for observability.
Cascade failure only occurs after all retries are exhausted.
Closes#30
* fix: address review — extract executeWithRetry, add delay cap, fix tests
- Extract `executeWithRetry()` as a testable exported function
- Add `computeRetryDelay()` with 30s max cap (prevents runaway backoff)
- Remove retry fields from `ParsedTaskSpec` (dead code for runTeam path)
- Deduplicate retry event emission (single code path for both error types)
- Injectable delay function for test determinism
- Rewrite tests to call the real `executeWithRetry`, not a copy
- 15 tests covering: success, retry+success, retry+failure, backoff
calculation, delay cap, delay function injection, no-retry default
* fix: clamp negative maxRetries/retryBackoff to safe values
- maxRetries clamped to >= 0 (negative values treated as no retry)
- retryBackoff clamped to >= 1 (prevents zero/negative delay oscillation)
- retryDelayMs clamped to >= 0
- Add tests for negative maxRetries and negative backoff
Addresses Codex review P1 on #37
* fix: accumulate token usage across retry attempts
Previously only the final attempt's tokenUsage was returned, causing
under-reporting of actual model consumption when retries occurred.
Now all attempts' token counts are summed in the returned result.
Addresses Codex review P2 (token usage) on #37
- Include error feedback user turn in mergedMessages to maintain
alternating user/assistant roles required by Anthropic API
- Use explicit undefined check instead of ?? for structured merge
to preserve null as a valid structured output value
When `outputSchema` is set on AgentConfig, the agent's final text output
is parsed as JSON, validated against the Zod schema, and exposed via
`result.structured`. On validation failure a single retry with error
feedback is attempted automatically.
Closes#29
Add examples/09-gemma4-auto-orchestration.ts demonstrating runTeam()
with Gemma 4 as the coordinator — the framework's key feature running
fully local. The coordinator successfully decomposes goals into JSON
task arrays, schedules dependencies, and synthesises results.
Verified on gemma4:e2b (5.1B params) with Ollama 0.20.0-rc1.
Add examples/08-gemma4-local.ts demonstrating a pure-local multi-agent
team using Gemma 4 via Ollama — zero API cost. Two agents (researcher +
summarizer) collaborate through a task pipeline with bash, file_write,
and file_read tools. Verified on gemma4:e2b with Ollama 0.20.0-rc1.
Update both READMEs: add example 08 to the examples table and note
Gemma 4 as a verified local model with tool-calling support.
Document 5 features we evaluated and chose not to implement
(handoffs, checkpointing, A2A, MCP, dashboard) to maintain
our "simplest multi-agent framework" positioning.
Closes#17, #20.