New
Work in progress: Agents Directory has just launched. Stay tuned, more content is on the way.
Sign InLlama 3.2 11B Vision Instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
Capabilities:
Input
- Text input
- Image input (vision)
- File input (PDF)
- Audio input
- Video input
Output
- Text output
- Image output
- Audio output
Pricing & availability:
OpenRouter
$0.345 / $0.345 per M
Sources:
Details:
Provider
MetaContext window
131KInput price
$0.345/MOutput price
$0.345/MOpen weights
YesKnowledge cutoff
Dec 2023Released
Sep 2024