New
Work in progress: Agents Directory has just launched. Stay tuned, more content is on the way.
Sign InU
UI-TARS 7B
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Capabilities:
Input
- Text input
- Image input (vision)
- File input (PDF)
- Audio input
- Video input
Output
- Text output
- Image output
- Audio output
Pricing & availability:
OpenRouter
$0.1 / $0.2 per M
Sources:
Details:
BProvider
ByteDanceContext window
128KInput price
$0.1/MOutput price
$0.2/MOpen weights
YesKnowledge cutoff
Jan 2025Released
Jul 2025