Inception: Mercury 2

inception/mercury-2

128KContext Window

50KMax Output

Supported Protocols:reasoninginclude_reasoningmax_tokensstoptemperaturetoolstool_choiceresponse_formatstructured_outputs

Online

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).

Capabilities

🧠 Reasoning🔧 Function CallingText GenerationCode GenerationAnalysis & Reasoningmodels.reasoning

Technical Specs

Input Modality

Text

Output Modality

Text

Arch

—

Default Temperature

0.75

Pricing

Pay per use, no monthly fees

Input Token< ¥0.001/1K Token

Output Token< ¥0.001/1K Token

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="inception/mercury-2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)