Powered by Arcee AI

Trinity Large Thinking

  • Text Generation

Trinity Large Thinking is Arcee AI’s open-weight, 398–400B-parameter sparse Mixture-of-Experts model focused on advanced reasoning and long-horizon agentic tasks. It is notable for activating only about 13B parameters per token while emitting explicit chain-of-thought traces for tool-using agents.

Start Using API

What is Trinity Large Thinking?

Trinity Large Thinking is an open-source, reasoning-focused sparse Mixture-of-Experts language model from Arcee AI with roughly 398–400 billion total parameters and about 13 billion active per token. It is primarily used for complex, long-horizon AI agents, multi-step tool calling, and transparent chain-of-thought reasoning in production workflows. It is also applied to long-context tasks such as RAG, structured outputs, and coding assistance where its large context window and explicit thinking blocks are valuable. The model belongs to Arcee AI’s Trinity Large family and is built on top of Trinity Large Base as its reasoning-specialized variant.

5 Core Capabilities

  • Advanced Reasoning

    Sparse Mixture-of-Experts model specialized for complex, long-horizon reasoning, planning, and multi-step decision-making across diverse knowledge domains.

  • Agentic Workflows

    Optimized for autonomous agents, multi-turn tool use, and orchestrating extended reasoning traces for production-grade AI agent pipelines.

  • Long-Context Handling

    Supports very long context windows (around 256K tokens), enabling analysis of large documents, multi-step traces, and extensive conversations.

  • Code Generation

    Delivers strong performance on coding benchmarks like LiveCodeBench, suitable for complex, multi-step programming and software engineering tasks.

  • Multilingual Text

    Handles multilingual text inputs and outputs, making it suitable for global applications requiring reasoning and generation across several languages.

6 Most Valuable Use Cases

  • Agentic workflow planning
  • Complex code generation
  • Long document reasoning
  • Tool-using AI agents
  • Enterprise customization
  • Benchmarking and evaluation

Cost Comparison

LLM API offers the lowest cost and highest performance for Trinity Large–class reasoning models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~140ms ~80 tps ~99.99% ~$0.70 ~$0.70 ~256K
Arcee AI Global ~220ms ~40 tps ~99.9% ~$2.50 ~$2.50 ~128K
OpenAI (o3-mini equivalent) Global ~260ms ~35 tps ~99.9% ~$1.10 ~$4.40 ~200K
Anthropic (Claude 3.7 Sonnet thinking) Global ~280ms ~30 tps ~99.9% ~$3.00 ~$15.00 ~200K
Google (Gemini 2.0 Pro thinking) Global ~240ms ~45 tps ~99.9% ~$1.50 ~$6.00 ~200K

Technical Specifications

Metric Trinity Large Thinking OpenAI o3 Anthropic Claude Sonnet 4
Avg Latency ~1.8s ~2.5s ~2.2s
Context Window 262K 200K 200K
Input Price ($/1M) $0.22 $2.00 $3.00
Output Price ($/1M) $0.80 $8.00 $15.00
Max Output Tokens 80K 32K 8K
Throughput ~40 tps ~20 tps ~18 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

12.5B
Prompt tokens processed (last 30 days)
145M
Completion tokens generated (30 days)
3.1M
API requests served (30 days)
99.8%
Avg uptime over last 30 days
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Automatically route each request to the optimal model across providers based on latency, capability, and constraints—no client changes or custom logic required.

    One endpoint, every model.
  • Cost-Aware Orchestration

    Enforce per-project and per-request cost controls while dynamically selecting cheaper equivalents, so you stay within budget without manually tuning every call.

    Optimize spend by default.
  • Resilient Fallback Flows

    Define automatic cross-provider fallbacks when a model is slow, degraded, or unavailable, keeping production traffic flowing without brittle retries or custom failover code.

    No single point of failure.
  • Full-Stack Observability

    Get unified traces, metrics, and logs for every AI call—across vendors—so you can debug prompts, compare models, and track performance from one dashboard.

    See every token, everywhere.
  • Task-Level Abstractions

    Describe tasks—chat, tools, RAG, classification—once and let LLM.API handle providers, schemas, and quirks so you ship features instead of glue code.

    Code to tasks, not models.
  • High-Throughput Batch Jobs

    Run massive batch inference across providers with automatic parallelization, rate-limit handling, and retries, turning slow offline jobs into predictable pipelines.

    Batch at production scale.

When to Use — When NOT to Use

Use it if...

  • You need strong multi-step reasoning for agentic workflows, planning, and complex decision pipelines.
  • You need long-context processing, like 200K+ token transcripts, logs, or technical documents.
  • Your use case involves building autonomous coding or debugging agents with reliable tool use.
  • You need transparent chain-of-thought style traces for auditable, stepwise enterprise reasoning.
  • Your use case involves self-hosting or fine-tuning an Apache-2.0-licensed open-weights reasoning model.
  • You need high reasoning benchmarks at lower cost than frontier closed-source reasoning models.
  • Your use case involves experimental orchestration of multiple tools and APIs across long multi-turn sessions.

Avoid if...

  • You need ultra-low-latency real-time chat or streaming responses with minimal reasoning overhead.
  • Your workload requires strict output brevity without verbose reasoning traces or intermediate thoughts.
  • You need lightweight on-device inference on CPUs or small GPUs with tight memory limits.
  • Your workload requires multimodal vision, image understanding, or audio capabilities not provided by Trinity.
  • You need a simple, fast, general-purpose assistant where deep reasoning is unnecessary overkill.
  • Your workload requires hardened, compliance-certified enterprise hosting where Arcee’s stack is unavailable.
  • You need predictable, very low token usage budgets instead of long, exploratory reasoning outputs.

Frequently Asked Questions

  • What is Trinity Large Thinking?

    Trinity Large Thinking is a large language model by Arcee AI optimized for complex reasoning, analysis, and multi-step problem solving.

  • What is Trinity Large Thinking best suited for?

    It is best for long-form reasoning tasks like code review, technical design discussions, research synthesis, and stepwise planning where depth of analysis matters.

  • What is the context window of Trinity Large Thinking?

    Trinity Large Thinking supports a large context window suitable for multi-document prompts, but exact token limits depend on the LLM.API deployment configuration.

  • How fast is Trinity Large Thinking in terms of latency?

    Latency varies by LLM.API region and load, but Trinity Large Thinking is tuned for interactive use rather than ultra-low-latency streaming workloads.

  • What modalities does Trinity Large Thinking support?

    Trinity Large Thinking is a text-only model focused on natural language input and output, without native image or audio processing.

  • How do I access Trinity Large Thinking through LLM.API?

    You can call Trinity Large Thinking via LLM.API by specifying the provider as Arcee AI and the model name as Trinity Large Thinking in requests.

  • How is Trinity Large Thinking priced on LLM.API?

    Pricing for Trinity Large Thinking is usage-based on input and output tokens, with exact rates listed on the LLM.API pricing page for Arcee AI models.

  • How does Trinity Large Thinking compare to smaller Trinity models?

    Compared to smaller Trinity variants, Trinity Large Thinking offers stronger reasoning and accuracy at higher compute cost and slightly increased latency.

  • What are the main limitations of Trinity Large Thinking?

    It can still hallucinate facts, struggle with very recent information, and is not suitable for real-time control or safety-critical decision-making without oversight.

  • Can I use Trinity Large Thinking for tool calling or function calling via LLM.API?

    Yes, as long as your LLM.API integration defines tools or functions, Trinity Large Thinking can follow structured tool-calling schemas in prompts.

Start in 2 lines of code

Get My API Key