AgentPool now maintains a per-agent Semaphore(1) that serializes
concurrent run() calls targeting the same Agent. This prevents
shared-state races on Agent.state (status, messages, tokenUsage)
when multiple independent tasks are assigned to the same agent.
Lock acquisition order: per-agent lock first, then pool semaphore,
so queued tasks don't waste pool slots while waiting.