1.8 KiB
1.8 KiB
Multimodal Capabilities with Claude
Source: anthropics/claude-cookbooks/multimodal
Vision Capabilities
Getting Started with Images
- Location:
multimodal/getting_started_with_vision.ipynb - Topics: Image upload, analysis, OCR, visual question answering
Best Practices for Vision
- Location:
multimodal/best_practices_for_vision.ipynb - Topics: Image quality, prompt engineering for vision, error handling
Charts and Graphs
- Location:
multimodal/reading_charts_graphs_powerpoints.ipynb - Topics: Data extraction from charts, graph interpretation, PowerPoint analysis
Form Extraction
- Location:
multimodal/how_to_transcribe_text.ipynb - Topics: OCR, structured data extraction, form processing
Image Generation
Illustrated Responses
- Location:
misc/illustrated_responses.ipynb - Topics: Integration with Stable Diffusion, image generation prompts
Code Examples
# Vision API example
import anthropic
client = anthropic.Anthropic()
# Analyze an image
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_base64
}
},
{
"type": "text",
"text": "What's in this image?"
}
]
}]
)
Tips
- Image Quality: Higher resolution images provide better results
- Prompt Clarity: Be specific about what you want to extract or analyze
- Format Support: JPEG, PNG, GIF, WebP supported
- Size Limits: Max 5MB per image