2.7 KiB
Local Embeddings Setup Guide
This guide explains how to set up local embeddings for the TradingAgents framework.
Why Local Embeddings?
When using LLM providers that don't support embeddings (like Anthropic), or when you want to avoid additional API costs, you need a local embedding solution.
Recommended: Run in Docker
The recommended approach is to run the embedding service in a Docker container. This keeps your main application environment clean and avoids installing heavy dependencies like PyTorch on your host machine.
1. Run the Embedding Service
Use the provided script to start the service:
./startEmbedding.sh
This runs Hugging Face Text Embeddings Inference (TEI), a high-performance server compatible with the OpenAI API.
(Note: The Go-based image clems4ever/all-minilm-l6-v2-go is a CLI tool and cannot merely be run as a server.)
2. Configure TradingAgents
Add (or update) these lines in your .env file:
# Point to your local embedding service (TEI supports /v1 API)
EMBEDDING_API_URL=http://localhost:11434/v1
# The model name configured in the start script
EMBEDDING_MODEL=all-MiniLM-L6-v2
3. Verify Setup
Run the verification script:
python3 verify_local_embeddings.py
Alternative: Local Installation (Development Only)
If you prefer to run everything locally without Docker (e.g., for development), you can install the library directly.
⚠️ Warning: This adds ~500MB of PyTorch dependencies to your environment.
1. Install Dependencies
pip install sentence-transformers
2. Configure TradingAgents
If you don't set EMBEDDING_API_URL, the system will attempt to import sentence-transformers automatically when using Anthropic.
# Optional: Force local provider
EMBEDDING_PROVIDER=local
Supported Providers
| LLM Provider | Default Behavior | Recommended Setup |
|---|---|---|
| Anthropic | Tries local service URL | Docker Service |
| Ollama | Uses Ollama API | Ensure Ollama is running |
| OpenAI | Uses OpenAI API | No setup needed |
| Uses Google API | No setup needed |
FAQ
Q: Why Docker?
A: sentence-transformers requires PyTorch, which is a very large dependency (~500MB+). Putting it in a container keeps your main application lightweight and portable.
Q: Can I use GPU?
A: Yes! Use the GPU version of the container: ghcr.io/huggingface/text-embeddings-inference:latest (requires NVIDIA Container Toolkit).
Q: Can I use Ollama instead?
A: Yes. Set EMBEDDING_API_URL=http://localhost:11434/v1 and EMBEDDING_MODEL=nomic-embed-text (or your preferred Ollama model).