diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..c10cf2f --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,323 @@ +# VCG Agent SDK — Roadmap + +Transform `open-multi-agent` into `@vcg/agent-sdk`: a turnkey agent framework for VCG's international applications. Devs get a simple agent they can pull into any app — chat agents, worker agents, scheduled jobs — all backed by our vLLM infrastructure. + +--- + +## Phase 1: Foundation — vLLM Adapter + Package Rebranding + +**Goal:** Agents can target our vLLM servers out of the box. + +### 1A. Package Rename + +- Rename from `open-multi-agent` to `@vcg/agent-sdk` +- Rename `OpenMultiAgent` class to `VCGAgent` (or `AgentSDK`) +- Update all exports, doc comments, and README + +### 1B. vLLM Adapter + +vLLM exposes an OpenAI-compatible API, so the adapter extends the existing OpenAI adapter pattern with custom base URL and model config. + +- **New** `src/llm/vllm.ts` — `VLLMAdapter` class +- **New** `src/llm/openai-compat.ts` — extract shared OpenAI-format helpers (message conversion, tool formatting, streaming) so both `OpenAIAdapter` and `VLLMAdapter` reuse them +- **Modify** `src/llm/adapter.ts` — add `'vllm'` to `createAdapter()` factory +- **Modify** `src/types.ts` — add `VLLMConfig` type, `'vllm'` to provider unions + +```typescript +interface VLLMConfig { + baseURL: string // e.g. "http://vllm-server:8000/v1" + model: string // e.g. "meta-llama/Llama-3-70b" + apiKey?: string + timeout?: number + maxRetries?: number +} +``` + +### 1C. Centralized Configuration + +- **New** `src/config/defaults.ts` — default vLLM server URL, model, common settings +- **New** `src/config/index.ts` — `loadConfig()` with priority: constructor args > env vars > config file +- Env vars: `VCG_VLLM_URL`, `VCG_VLLM_MODEL`, `VCG_VLLM_API_KEY`, `VCG_DEFAULT_PROVIDER`, `VCG_LOG_LEVEL`, `VCG_LOCALE` + +--- + +## Phase 2: Developer Experience — Presets + Simple API + +**Goal:** Working agent in ~5 lines of code. + +### 2A. Agent Presets + +- **New** `src/presets/chat.ts` — `createChatAgent(config?)` + - Multi-turn history, streaming, temperature 0.7 + - Defaults to vLLM from env config +- **New** `src/presets/worker.ts` — `createWorkerAgent(config?)` + - Single-turn (stateless), built-in tools loaded, temperature 0, maxTurns 20 +- **New** `src/presets/index.ts` — re-exports + +```typescript +import { createChatAgent, createWorkerAgent } from '@vcg/agent-sdk' + +const chat = createChatAgent({ name: 'support-bot' }) +const reply = await chat.prompt('How do I reset my password?') + +const worker = createWorkerAgent({ tools: [myCustomTool] }) +const result = await worker.run('Process this data file') +``` + +### 2B. Configuration Presets + +- **New** `src/config/presets.ts` — named profiles: `'production'`, `'development'`, `'lightweight'` +- Auto-detect environment and apply appropriate defaults + +### 2C. Structured Logger + +- **New** `src/logger.ts` — simple console-based logger with level filtering (`debug` | `info` | `warn` | `error` | `silent`) +- No external dependency, used by middleware/presets/scheduler + +--- + +## Phase 3: Custom Tool Ecosystem + +**Goal:** Pre-built tool packs, middleware, and easy custom tool authoring. + +### 3A. Tool Packs + +Pre-built tool collections, each a function returning `Tool[]` for configurability. + +- **New** `src/tool/packs/http.ts` — `httpToolPack(config?)`: GET, POST, PUT, DELETE with auth headers, timeout, response size limits (native `fetch`) +- **New** `src/tool/packs/database.ts` — `databaseToolPack(config?)`: generic SQL query/execute via pluggable `DatabaseConnection` interface (DB drivers are peer deps) +- **New** `src/tool/packs/json.ts` — `jsonToolPack()`: parse, validate, transform JSON/YAML +- **New** `src/tool/packs/index.ts` — re-exports + `registerAllPacks(registry)` + +```typescript +const httpTools = httpToolPack({ defaultHeaders: { Authorization: 'Bearer ...' } }) +registry.registerPack(httpTools) +``` + +### 3B. Tool Middleware + +Composable wrappers for cross-cutting concerns on tool execution. + +- **New** `src/tool/middleware.ts` + - `withLogging(tool, logger?)` — log inputs, outputs, duration + - `withRateLimit(tool, { maxPerMinute })` — token bucket throttle + - `withAuth(tool, validator)` — permission check before execution + - `withTimeout(tool, ms)` — hard timeout via AbortController + - `withRetry(tool, { maxRetries })` — exponential backoff on transient errors +- Composable: `withLogging(withRateLimit(myTool, { maxPerMinute: 10 }))` + +### 3C. Tool Sharing Across Agents/Teams + +- Add optional `sharedToolRegistry` to `OrchestratorConfig` and `TeamConfig` +- When present, all agents in the team share the same registry instead of creating fresh ones +- Add `ToolRegistry.registerPack()` / `deregisterPack()` for bulk registration + +--- + +## Phase 4: Built-In Request Queuing + +**Goal:** Production-grade request management for LLM calls — rate limiting, prioritization, backpressure, and retry built into the framework so devs don't have to manage it themselves. + +### 4A. LLM Request Queue + +When multiple agents or scheduled jobs fire concurrently, raw LLM calls can overwhelm a vLLM server or hit API rate limits. A built-in queue sits between agents and adapters. + +- **New** `src/llm/queue.ts` — `LLMRequestQueue` class + +```typescript +interface QueueConfig { + maxConcurrent: number // max parallel LLM calls (default: 5) + maxQueueSize: number // max pending requests before rejecting (default: 100) + defaultPriority: number // 0 = highest (default: 5) + rateLimit?: { + requestsPerMinute: number // token bucket rate limit + burstSize?: number // allow short bursts above steady rate + } + retry?: { + maxRetries: number // retry on transient failures (default: 3) + backoffMs: number // initial backoff (default: 1000) + backoffMultiplier: number // exponential factor (default: 2) + retryableErrors: string[] // error codes/messages to retry on + } + timeout?: number // per-request timeout in ms +} +``` + +Core behavior: +- Wraps any `LLMAdapter` transparently — agents don't know the queue exists +- Priority queue (lower number = higher priority): chat agents get priority 1, cron workers get priority 10 +- Semaphore-gated concurrency (reuses existing `src/utils/semaphore.ts`) +- Token bucket rate limiting to stay within vLLM server capacity +- Automatic retry with exponential backoff for 429s, 503s, and connection errors +- Backpressure: rejects with a clear error when queue is full rather than growing unbounded +- Request deduplication (optional): identical prompts within a time window return the same pending result + +### 4B. Queued Adapter Wrapper + +- **New** `src/llm/queued-adapter.ts` — `QueuedLLMAdapter` implementing `LLMAdapter` + +```typescript +class QueuedLLMAdapter implements LLMAdapter { + constructor(inner: LLMAdapter, config?: QueueConfig) + chat(messages, options): Promise // enqueues, awaits turn + stream(messages, options): AsyncIterable // enqueues, streams when ready + getQueueStatus(): QueueStatus + drain(): Promise // wait for all pending requests to complete + pause(): void // stop dequeuing (in-flight requests finish) + resume(): void // resume dequeuing +} + +interface QueueStatus { + pending: number + active: number + completed: number + failed: number + avgLatencyMs: number + queueDepth: number +} +``` + +### 4C. Integration Points + +- **Modify** `src/llm/adapter.ts` — `createAdapter()` accepts optional `queue?: QueueConfig`; when provided, wraps the adapter in `QueuedLLMAdapter` automatically +- **Modify** `src/orchestrator/orchestrator.ts` — orchestrator-level queue config flows to all agents in a team, so one queue manages all LLM traffic for the team +- **Modify** `src/presets/` — presets wire up default queue config (e.g., worker preset uses queue with retry enabled, chat preset uses low-latency queue with higher priority) +- **Modify** `src/config/defaults.ts` — default queue settings in centralized config + +### 4D. Per-Provider Queue Policies + +Different backends have different constraints: +- **vLLM** — limited by GPU memory/batch size; queue focuses on `maxConcurrent` +- **Anthropic API** — has rate limits (RPM/TPM); queue focuses on `rateLimit` +- **OpenAI API** — similar rate limits; queue focuses on `rateLimit` + +Default policies per provider baked into config so devs don't have to tune: + +```typescript +const defaultQueuePolicies: Record> = { + vllm: { maxConcurrent: 8, rateLimit: undefined }, // GPU-bound + anthropic: { maxConcurrent: 5, rateLimit: { requestsPerMinute: 50 } }, + openai: { maxConcurrent: 5, rateLimit: { requestsPerMinute: 60 } }, +} +``` + +### 4E. Observability + +- Queue emits events: `request:enqueued`, `request:started`, `request:completed`, `request:failed`, `request:retried`, `queue:full` +- Ties into the structured logger from Phase 2C +- `getQueueStatus()` available at adapter, orchestrator, and preset level for health dashboards + +--- + +## Phase 5: Cron / Scheduled Agent Support + +**Goal:** Agents that run on schedules with monitoring. + +### 5A. Cron Scheduler + +- **New** `src/scheduler/cron-agent.ts` — `CronAgent` wrapping an agent with schedule + +```typescript +const scheduled = new CronAgent({ + agent: createWorkerAgent(), + schedule: '0 */6 * * *', + task: 'Check service health and report', + onComplete: (result) => webhook(result), + onError: (err) => alerting(err), +}) +scheduled.start() +``` + +- **New** `src/scheduler/cron-manager.ts` — registry of all scheduled agents (start/stop/list/status) +- **New** `src/scheduler/cron-parser.ts` — lightweight cron expression parser (or add `cron-parser` dep) + +### 5B. Persistence Layer + +- **New** `src/scheduler/persistence.ts` — pluggable store for schedule state (last run, next run, results) +- Default: file-based JSON store +- Interface for Redis/DB backends + +### 5C. Health and Monitoring + +- `CronAgent.getStatus()` — last run, next run, success/failure count +- `CronManager.getHealthReport()` — all agents' status +- Optional webhook/callback on completion or failure +- **New** `src/scheduler/webhook.ts` — POST results to callback URLs with retry + +--- + +## Phase 6: Internationalization (i18n) + +**Goal:** Agents work naturally across languages and locales. + +### 6A. Locale System + +- **New** `src/i18n/locale-manager.ts` — manages locale-specific system prompts and tool descriptions +- **New** `src/i18n/locales/en.json`, `ja.json`, `zh-CN.json`, `ko.json` — translation maps for SDK-internal strings + +```typescript +const agent = createChatAgent({ locale: 'ja-JP' }) +// System prompt automatically in Japanese, tool descriptions localized +``` + +### 6B. Locale-Aware Tools + +- Extend `ToolContext` with `locale` field so tools can format responses appropriately (dates, numbers, currencies) +- Thread locale through `AgentRunner` → `ToolExecutor` → tool execution + +### 6C. Character Encoding + +- Verify UTF-8/multibyte handling across all adapters (should already work) +- Token counting awareness for CJK scripts in context management + +--- + +## Phase 7: Production Hardening + Examples + +**Goal:** Production-ready with great onboarding. + +- Structured logging throughout (pluggable logger interface) +- Error taxonomy: network errors vs tool errors vs LLM errors vs queue errors +- Graceful shutdown: drain queue, finish in-flight requests, stop cron jobs +- Health endpoint helper for container orchestration (Kubernetes readiness/liveness) + +### Examples + +- `examples/05-vllm-quickstart.ts` — chat agent on vLLM +- `examples/06-custom-tools.ts` — tool packs + middleware +- `examples/07-cron-worker.ts` — scheduled agent job +- `examples/08-i18n-agent.ts` — multi-language agent +- `examples/09-queued-agents.ts` — queue config + monitoring + +--- + +## Build Order + +``` +Phase 1 (vLLM + rebrand) <- Start here, immediate value + | +Phase 2 (presets + DX) <- Devs can start using it + | +Phase 3 (tool packs) \ + >-- Can be parallelized +Phase 4 (request queuing) / + | +Phase 5 (cron scheduler) <- Depends on queue (Phase 4) + | +Phase 6 (i18n) <- Can start anytime after Phase 2 + | +Phase 7 (production hardening) <- Final polish +``` + +## Key Architectural Decisions + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| vLLM adapter approach | Extend OpenAI adapter via shared `openai-compat.ts` | vLLM is OpenAI-compatible; avoids code duplication | +| Request queue placement | Transparent wrapper around `LLMAdapter` | Agents are unaware of queuing; zero code changes for consumers | +| Queue implementation | Priority queue + semaphore + token bucket | Handles concurrency, rate limits, and fairness in one layer | +| Config management | Env vars > config file > constructor (merge) | Flexible for different deployment contexts | +| Cron library | Lightweight internal parser (or `cron-parser` dep) | Avoids heavy dependencies | +| i18n approach | JSON locale files + template system | Simple, no heavy framework needed | +| Tool middleware | Function composition (decorator pattern) | Familiar, zero-dependency, composable | +| Presets | Factory functions returning standard `Agent` | No new class hierarchies, just opinionated config |