13 KiB

Raw Blame History

VCG Agent SDK — Roadmap

Transform open-multi-agent into @vcg/agent-sdk: a turnkey agent framework for VCG's international applications. Devs get a simple agent they can pull into any app — chat agents, worker agents, scheduled jobs — all backed by our vLLM infrastructure.

Phase 1: Foundation — vLLM Adapter + Package Rebranding ✅ COMPLETE

Goal: Agents can target our vLLM servers out of the box.

1A. Package Rename ✅

✅ Renamed from open-multi-agent to @vcg/agent-sdk
✅ Renamed OpenMultiAgent class to VCGAgentSDK
✅ Added deprecated OpenMultiAgent re-export alias for backward compat
✅ Updated all exports, doc comments, JSDoc, and example files

1B. vLLM Adapter ✅

✅ New src/llm/openai-compat.ts — extracted shared OpenAI-format helpers (message conversion, tool formatting, response parsing, streaming) so both OpenAIAdapter and VLLMAdapter reuse them
✅ New src/llm/vllm.ts — VLLMAdapter class with chat(), stream(), and healthCheck()
✅ Modified src/llm/openai.ts — refactored to import from openai-compat.ts
✅ Modified src/llm/adapter.ts — added 'vllm' to createAdapter() factory; accepts VLLMConfig object
✅ Modified src/types.ts — added VLLMConfig type, 'vllm' to all provider unions

interface VLLMConfig {
  baseURL: string          // e.g. "http://vllm-server:8000/v1"
  model: string            // e.g. "meta-llama/Llama-3-70b"
  apiKey?: string
  timeout?: number
  maxRetries?: number
}

1C. Centralized Configuration ✅

✅ New src/config/defaults.ts — DEFAULT_CONFIG and loadConfig(overrides?) with priority: constructor args > env vars > defaults
✅ New src/config/index.ts — re-exports
✅ New VCGConfig type in src/types.ts
✅ Env vars: VCG_VLLM_URL, VCG_VLLM_MODEL, VCG_VLLM_API_KEY, VCG_DEFAULT_PROVIDER, VCG_MAX_CONCURRENCY, VCG_LOG_LEVEL

Phase 2: Developer Experience — Presets + Simple API

Goal: Working agent in ~5 lines of code.

2A. Agent Presets

New src/presets/chat.ts — createChatAgent(config?)
- Multi-turn history, streaming, temperature 0.7
- Defaults to vLLM from env config
New src/presets/worker.ts — createWorkerAgent(config?)
- Single-turn (stateless), built-in tools loaded, temperature 0, maxTurns 20
New src/presets/index.ts — re-exports

import { createChatAgent, createWorkerAgent } from '@vcg/agent-sdk'

const chat = createChatAgent({ name: 'support-bot' })
const reply = await chat.prompt('How do I reset my password?')

const worker = createWorkerAgent({ tools: [myCustomTool] })
const result = await worker.run('Process this data file')

2B. Configuration Presets

New src/config/presets.ts — named profiles: 'production', 'development', 'lightweight'
Auto-detect environment and apply appropriate defaults

2C. Structured Logger

New src/logger.ts — simple console-based logger with level filtering (debug | info | warn | error | silent)
No external dependency, used by middleware/presets/scheduler

Phase 3: Custom Tool Ecosystem

Goal: Pre-built tool packs, middleware, and easy custom tool authoring.

3A. Tool Packs

Pre-built tool collections, each a function returning Tool[] for configurability.

New src/tool/packs/http.ts — httpToolPack(config?): GET, POST, PUT, DELETE with auth headers, timeout, response size limits (native fetch)
New src/tool/packs/database.ts — databaseToolPack(config?): generic SQL query/execute via pluggable DatabaseConnection interface (DB drivers are peer deps)
New src/tool/packs/json.ts — jsonToolPack(): parse, validate, transform JSON/YAML
New src/tool/packs/index.ts — re-exports + registerAllPacks(registry)

const httpTools = httpToolPack({ defaultHeaders: { Authorization: 'Bearer ...' } })
registry.registerPack(httpTools)

3B. Tool Middleware

Composable wrappers for cross-cutting concerns on tool execution.

New src/tool/middleware.ts
- withLogging(tool, logger?) — log inputs, outputs, duration
- withRateLimit(tool, { maxPerMinute }) — token bucket throttle
- withAuth(tool, validator) — permission check before execution
- withTimeout(tool, ms) — hard timeout via AbortController
- withRetry(tool, { maxRetries }) — exponential backoff on transient errors
Composable: withLogging(withRateLimit(myTool, { maxPerMinute: 10 }))

Add optional sharedToolRegistry to OrchestratorConfig and TeamConfig
When present, all agents in the team share the same registry instead of creating fresh ones
Add ToolRegistry.registerPack() / deregisterPack() for bulk registration

Phase 4: Built-In Request Queuing

Goal: Production-grade request management for LLM calls — rate limiting, prioritization, backpressure, and retry built into the framework so devs don't have to manage it themselves.

4A. LLM Request Queue

When multiple agents or scheduled jobs fire concurrently, raw LLM calls can overwhelm a vLLM server or hit API rate limits. A built-in queue sits between agents and adapters.

New src/llm/queue.ts — LLMRequestQueue class

interface QueueConfig {
  maxConcurrent: number       // max parallel LLM calls (default: 5)
  maxQueueSize: number        // max pending requests before rejecting (default: 100)
  defaultPriority: number     // 0 = highest (default: 5)
  rateLimit?: {
    requestsPerMinute: number // token bucket rate limit
    burstSize?: number        // allow short bursts above steady rate
  }
  retry?: {
    maxRetries: number        // retry on transient failures (default: 3)
    backoffMs: number         // initial backoff (default: 1000)
    backoffMultiplier: number // exponential factor (default: 2)
    retryableErrors: string[] // error codes/messages to retry on
  }
  timeout?: number            // per-request timeout in ms
}

Core behavior:

Wraps any LLMAdapter transparently — agents don't know the queue exists
Priority queue (lower number = higher priority): chat agents get priority 1, cron workers get priority 10
Semaphore-gated concurrency (reuses existing src/utils/semaphore.ts)
Token bucket rate limiting to stay within vLLM server capacity
Automatic retry with exponential backoff for 429s, 503s, and connection errors
Backpressure: rejects with a clear error when queue is full rather than growing unbounded
Request deduplication (optional): identical prompts within a time window return the same pending result

4B. Queued Adapter Wrapper

New src/llm/queued-adapter.ts — QueuedLLMAdapter implementing LLMAdapter

class QueuedLLMAdapter implements LLMAdapter {
  constructor(inner: LLMAdapter, config?: QueueConfig)
  chat(messages, options): Promise<LLMResponse>   // enqueues, awaits turn
  stream(messages, options): AsyncIterable<StreamEvent>  // enqueues, streams when ready
  getQueueStatus(): QueueStatus
  drain(): Promise<void>   // wait for all pending requests to complete
  pause(): void            // stop dequeuing (in-flight requests finish)
  resume(): void           // resume dequeuing
}

interface QueueStatus {
  pending: number
  active: number
  completed: number
  failed: number
  avgLatencyMs: number
  queueDepth: number
}

4C. Integration Points

Modify src/llm/adapter.ts — createAdapter() accepts optional queue?: QueueConfig; when provided, wraps the adapter in QueuedLLMAdapter automatically
Modify src/orchestrator/orchestrator.ts — orchestrator-level queue config flows to all agents in a team, so one queue manages all LLM traffic for the team
Modify src/presets/ — presets wire up default queue config (e.g., worker preset uses queue with retry enabled, chat preset uses low-latency queue with higher priority)
Modify src/config/defaults.ts — default queue settings in centralized config

4D. Per-Provider Queue Policies

Different backends have different constraints:

vLLM — limited by GPU memory/batch size; queue focuses on maxConcurrent
Anthropic API — has rate limits (RPM/TPM); queue focuses on rateLimit
OpenAI API — similar rate limits; queue focuses on rateLimit

Default policies per provider baked into config so devs don't have to tune:

const defaultQueuePolicies: Record<SupportedProvider, Partial<QueueConfig>> = {
  vllm:      { maxConcurrent: 8, rateLimit: undefined },        // GPU-bound
  anthropic: { maxConcurrent: 5, rateLimit: { requestsPerMinute: 50 } },
  openai:    { maxConcurrent: 5, rateLimit: { requestsPerMinute: 60 } },
}

4E. Observability

Queue emits events: request:enqueued, request:started, request:completed, request:failed, request:retried, queue:full
Ties into the structured logger from Phase 2C
getQueueStatus() available at adapter, orchestrator, and preset level for health dashboards

Phase 5: Cron / Scheduled Agent Support

Goal: Agents that run on schedules with monitoring.

5A. Cron Scheduler

New src/scheduler/cron-agent.ts — CronAgent wrapping an agent with schedule

const scheduled = new CronAgent({
  agent: createWorkerAgent(),
  schedule: '0 */6 * * *',
  task: 'Check service health and report',
  onComplete: (result) => webhook(result),
  onError: (err) => alerting(err),
})
scheduled.start()

New src/scheduler/cron-manager.ts — registry of all scheduled agents (start/stop/list/status)
New src/scheduler/cron-parser.ts — lightweight cron expression parser (or add cron-parser dep)

5B. Persistence Layer

New src/scheduler/persistence.ts — pluggable store for schedule state (last run, next run, results)
Default: file-based JSON store
Interface for Redis/DB backends

5C. Health and Monitoring

CronAgent.getStatus() — last run, next run, success/failure count
CronManager.getHealthReport() — all agents' status
Optional webhook/callback on completion or failure
New src/scheduler/webhook.ts — POST results to callback URLs with retry

Phase 6: Internationalization (i18n)

Goal: Agents work naturally across languages and locales.

6A. Locale System

New src/i18n/locale-manager.ts — manages locale-specific system prompts and tool descriptions
New src/i18n/locales/en.json, ja.json, zh-CN.json, ko.json — translation maps for SDK-internal strings

const agent = createChatAgent({ locale: 'ja-JP' })
// System prompt automatically in Japanese, tool descriptions localized

6B. Locale-Aware Tools

Extend ToolContext with locale field so tools can format responses appropriately (dates, numbers, currencies)
Thread locale through AgentRunner → ToolExecutor → tool execution

6C. Character Encoding

Verify UTF-8/multibyte handling across all adapters (should already work)
Token counting awareness for CJK scripts in context management

Phase 7: Production Hardening + Examples

Goal: Production-ready with great onboarding.

Structured logging throughout (pluggable logger interface)
Error taxonomy: network errors vs tool errors vs LLM errors vs queue errors
Graceful shutdown: drain queue, finish in-flight requests, stop cron jobs
Health endpoint helper for container orchestration (Kubernetes readiness/liveness)

Examples

examples/05-vllm-quickstart.ts — chat agent on vLLM
examples/06-custom-tools.ts — tool packs + middleware
examples/07-cron-worker.ts — scheduled agent job
examples/08-i18n-agent.ts — multi-language agent
examples/09-queued-agents.ts — queue config + monitoring

Build Order

Phase 1 (vLLM + rebrand)       ✅ COMPLETE
  |
Phase 2 (presets + DX)          <- NEXT: Devs can start using it
  |
Phase 3 (tool packs)        \
                              >-- Can be parallelized
Phase 4 (request queuing)   /
  |
Phase 5 (cron scheduler)       <- Depends on queue (Phase 4)
  |
Phase 6 (i18n)                  <- Can start anytime after Phase 2
  |
Phase 7 (production hardening)  <- Final polish

Key Architectural Decisions

Decision	Choice	Rationale
vLLM adapter approach	Extend OpenAI adapter via shared `openai-compat.ts`	vLLM is OpenAI-compatible; avoids code duplication
Request queue placement	Transparent wrapper around `LLMAdapter`	Agents are unaware of queuing; zero code changes for consumers
Queue implementation	Priority queue + semaphore + token bucket	Handles concurrency, rate limits, and fairness in one layer
Config management	Env vars > constructor args > defaults (merge)	Flexible for different deployment contexts
Cron library	Lightweight internal parser (or `cron-parser` dep)	Avoids heavy dependencies
i18n approach	JSON locale files + template system	Simple, no heavy framework needed
Tool middleware	Function composition (decorator pattern)	Familiar, zero-dependency, composable
Presets	Factory functions returning standard `Agent`	No new class hierarchies, just opinionated config

13 KiB Raw Blame History