New

Work in progress: Agents Directory has just launched. Stay tuned, more content is on the way.

Sign In
Meta

Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Capabilities:

Input
  • Text input
  • Image input (vision)
  • File input (PDF)
  • Audio input
  • Video input
Output
  • Text output
  • Image output
  • Audio output
Pricing & availability:
  • OpenRouterOpenRouter


    $0.345 / $0.345 per M
Share:
Details:
  • MetaProvider


    Meta
  • Context window


    131K
  • Input price


    $0.345/M
  • Output price


    $0.345/M
  • Open weights


    Yes
  • Knowledge cutoff


    Dec 2023
  • Released


    Sep 2024