vibe-coding-cn/skills/claude-cookbooks/references/multimodal.md

# Multimodal Capabilities with Claude

Source: anthropics/claude-cookbooks/multimodal

## Vision Capabilities

### Getting Started with Images
- **Location**: `multimodal/getting_started_with_vision.ipynb`
- **Topics**: Image upload, analysis, OCR, visual question answering

### Best Practices for Vision
- **Location**: `multimodal/best_practices_for_vision.ipynb`
- **Topics**: Image quality, prompt engineering for vision, error handling

### Charts and Graphs
- **Location**: `multimodal/reading_charts_graphs_powerpoints.ipynb`
- **Topics**: Data extraction from charts, graph interpretation, PowerPoint analysis

### Form Extraction
- **Location**: `multimodal/how_to_transcribe_text.ipynb`
- **Topics**: OCR, structured data extraction, form processing

## Image Generation

### Illustrated Responses
- **Location**: `misc/illustrated_responses.ipynb`
- **Topics**: Integration with Stable Diffusion, image generation prompts

## Code Examples

```python
# Vision API example
import anthropic

client = anthropic.Anthropic()

# Analyze an image
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_base64
                }
            },
            {
                "type": "text",
                "text": "What's in this image?"
            }
        ]
    }]
)
```

## Tips

1. **Image Quality**: Higher resolution images provide better results
2. **Prompt Clarity**: Be specific about what you want to extract or analyze
3. **Format Support**: JPEG, PNG, GIF, WebP supported
4. **Size Limits**: Max 5MB per image