67 lines
1.8 KiB
Markdown
67 lines
1.8 KiB
Markdown
# Multimodal Capabilities with Claude
|
|
|
|
Source: anthropics/claude-cookbooks/multimodal
|
|
|
|
## Vision Capabilities
|
|
|
|
### Getting Started with Images
|
|
- **Location**: `multimodal/getting_started_with_vision.ipynb`
|
|
- **Topics**: Image upload, analysis, OCR, visual question answering
|
|
|
|
### Best Practices for Vision
|
|
- **Location**: `multimodal/best_practices_for_vision.ipynb`
|
|
- **Topics**: Image quality, prompt engineering for vision, error handling
|
|
|
|
### Charts and Graphs
|
|
- **Location**: `multimodal/reading_charts_graphs_powerpoints.ipynb`
|
|
- **Topics**: Data extraction from charts, graph interpretation, PowerPoint analysis
|
|
|
|
### Form Extraction
|
|
- **Location**: `multimodal/how_to_transcribe_text.ipynb`
|
|
- **Topics**: OCR, structured data extraction, form processing
|
|
|
|
## Image Generation
|
|
|
|
### Illustrated Responses
|
|
- **Location**: `misc/illustrated_responses.ipynb`
|
|
- **Topics**: Integration with Stable Diffusion, image generation prompts
|
|
|
|
## Code Examples
|
|
|
|
```python
|
|
# Vision API example
|
|
import anthropic
|
|
|
|
client = anthropic.Anthropic()
|
|
|
|
# Analyze an image
|
|
response = client.messages.create(
|
|
model="claude-3-5-sonnet-20241022",
|
|
max_tokens=1024,
|
|
messages=[{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image",
|
|
"source": {
|
|
"type": "base64",
|
|
"media_type": "image/jpeg",
|
|
"data": image_base64
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "What's in this image?"
|
|
}
|
|
]
|
|
}]
|
|
)
|
|
```
|
|
|
|
## Tips
|
|
|
|
1. **Image Quality**: Higher resolution images provide better results
|
|
2. **Prompt Clarity**: Be specific about what you want to extract or analyze
|
|
3. **Format Support**: JPEG, PNG, GIF, WebP supported
|
|
4. **Size Limits**: Max 5MB per image
|