inceptionChat
Inception: Mercury 2
inception/mercury-2
128KContext Window
50KMax Output
Supported Protocols:reasoninginclude_reasoningmax_tokensstoptemperaturetoolstool_choiceresponse_formatstructured_outputs
Online
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).
Capabilities
🧠 Reasoning🔧 Function CallingText GenerationCode GenerationAnalysis & Reasoningmodels.reasoning
Technical Specs
Input Modality
Text
Output Modality
Text
Arch
—
Default Temperature
0.75
Pricing
Pay per use, no monthly feesInput Token< ¥0.001/1K Token
Output Token< ¥0.001/1K Token
Quick Start
from openai import OpenAI
client = OpenAI(
base_url="https://api.uniontoken.ai/v1",
api_key="YOUR_UNIONTOKEN_API_KEY",
)
response = client.chat.completions.create(
model="inception/mercury-2",
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)FAQ
Inception: Mercury 2
inception/mercury-2
In< ¥0.001/1K
Out< ¥0.001/1K
Context Window128K
Max Output50K
Related Models
View All → →Ready to get started?
Get 1M free tokens on registration, no monthly fees or minimum spend
Register Now →