AI Models/bytedance-research/Bytedance: UI-TARS 72B
bytedance-researchChat

Bytedance: UI-TARS 72B

bytedance-research/ui-tars-72b
33KContext Window
Online

UI-TARS 72B is an open-source multimodal AI model designed specifically for automating browser and desktop tasks through visual interaction and control. The model is built with a specialized vision architecture enabling accurate interpretation and manipulation of on-screen visual data. It supports automation tasks within web browsers as well as desktop applications, including Microsoft Office and VS Code. Core capabilities include intelligent screen detection, predictive action modeling, and efficient handling of repetitive interactions. UI-TARS employs supervised fine-tuning (SFT) tailored explicitly for computer control scenarios. It can be deployed locally or accessed via Hugging Face for demonstration purposes. Intended use cases encompass workflow automation, task scripting, and interactive desktop control applications.

Capabilities

👁 VisionText GenerationCode GenerationAnalysis & Reasoningmodels.reasoning

Technical Specs

Input Modality
Text、Image
Output Modality
Text
Arch

Pricing

Pay per use, no monthly fees
Input Token< ¥0.001/1K Token
Output Token< ¥0.001/1K Token

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="bytedance-research/ui-tars-72b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

FAQ

Bytedance: UI-TARS 72B
bytedance-research/ui-tars-72b
In< ¥0.001/1K
Out< ¥0.001/1K
Context Window33K
Start Using →View Integration Docs

Ready to get started?

Get 1M free tokens on registration, no monthly fees or minimum spend

Register Now →