Powered by Arcee AI
Trinity Large Thinking
- Text Generation
Trinity Large Thinking is Arcee AI’s open-weight, 398–400B-parameter sparse Mixture-of-Experts model focused on advanced reasoning and long-horizon agentic tasks. It is notable for activating only about 13B parameters per token while emitting explicit chain-of-thought traces for tool-using agents.
About the model
What is Trinity Large Thinking?
Trinity Large Thinking is an open-source, reasoning-focused sparse Mixture-of-Experts language model from Arcee AI with roughly 398–400 billion total parameters and about 13 billion active per token. It is primarily used for complex, long-horizon AI agents, multi-step tool calling, and transparent chain-of-thought reasoning in production workflows. It is also applied to long-context tasks such as RAG, structured outputs, and coding assistance where its large context window and explicit thinking blocks are valuable. The model belongs to Arcee AI’s Trinity Large family and is built on top of Trinity Large Base as its reasoning-specialized variant.
Model capabilities
5 Core Capabilities
-
Advanced Reasoning
Sparse Mixture-of-Experts model specialized for complex, long-horizon reasoning, planning, and multi-step decision-making across diverse knowledge domains.
-
Agentic Workflows
Optimized for autonomous agents, multi-turn tool use, and orchestrating extended reasoning traces for production-grade AI agent pipelines.
-
Long-Context Handling
Supports very long context windows (around 256K tokens), enabling analysis of large documents, multi-step traces, and extensive conversations.
-
Code Generation
Delivers strong performance on coding benchmarks like LiveCodeBench, suitable for complex, multi-step programming and software engineering tasks.
-
Multilingual Text
Handles multilingual text inputs and outputs, making it suitable for global applications requiring reasoning and generation across several languages.
Use cases
6 Most Valuable Use Cases
- Agentic workflow planning
- Complex code generation
- Long document reasoning
- Tool-using AI agents
- Enterprise customization
- Benchmarking and evaluation
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance for Trinity Large–class reasoning models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~140ms | ~80 tps | ~99.99% | ~$0.70 | ~$0.70 | ~256K |
| Arcee AI | Global | ~220ms | ~40 tps | ~99.9% | ~$2.50 | ~$2.50 | ~128K |
| OpenAI (o3-mini equivalent) | Global | ~260ms | ~35 tps | ~99.9% | ~$1.10 | ~$4.40 | ~200K |
| Anthropic (Claude 3.7 Sonnet thinking) | Global | ~280ms | ~30 tps | ~99.9% | ~$3.00 | ~$15.00 | ~200K |
| Google (Gemini 2.0 Pro thinking) | Global | ~240ms | ~45 tps | ~99.9% | ~$1.50 | ~$6.00 | ~200K |
Performance benchmarks
Technical Specifications
| Metric | Trinity Large Thinking | OpenAI o3 | Anthropic Claude Sonnet 4 |
|---|---|---|---|
| Avg Latency | ~1.8s | ~2.5s | ~2.2s |
| Context Window | 262K | 200K | 200K |
| Input Price ($/1M) | $0.22 | $2.00 | $3.00 |
| Output Price ($/1M) | $0.80 | $8.00 | $15.00 |
| Max Output Tokens | 80K | 32K | 8K |
| Throughput | ~40 tps | ~20 tps | ~18 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 12.5B
- Prompt tokens processed (last 30 days)
- 145M
- Completion tokens generated (30 days)
- 3.1M
- API requests served (30 days)
- 99.8%
- Avg uptime over last 30 days
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Automatically route each request to the optimal model across providers based on latency, capability, and constraints—no client changes or custom logic required.
One endpoint, every model. -
Cost-Aware Orchestration
Enforce per-project and per-request cost controls while dynamically selecting cheaper equivalents, so you stay within budget without manually tuning every call.
Optimize spend by default. -
Resilient Fallback Flows
Define automatic cross-provider fallbacks when a model is slow, degraded, or unavailable, keeping production traffic flowing without brittle retries or custom failover code.
No single point of failure. -
Full-Stack Observability
Get unified traces, metrics, and logs for every AI call—across vendors—so you can debug prompts, compare models, and track performance from one dashboard.
See every token, everywhere. -
Task-Level Abstractions
Describe tasks—chat, tools, RAG, classification—once and let LLM.API handle providers, schemas, and quirks so you ship features instead of glue code.
Code to tasks, not models. -
High-Throughput Batch Jobs
Run massive batch inference across providers with automatic parallelization, rate-limit handling, and retries, turning slow offline jobs into predictable pipelines.
Batch at production scale.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need strong multi-step reasoning for agentic workflows, planning, and complex decision pipelines.
- You need long-context processing, like 200K+ token transcripts, logs, or technical documents.
- Your use case involves building autonomous coding or debugging agents with reliable tool use.
- You need transparent chain-of-thought style traces for auditable, stepwise enterprise reasoning.
- Your use case involves self-hosting or fine-tuning an Apache-2.0-licensed open-weights reasoning model.
- You need high reasoning benchmarks at lower cost than frontier closed-source reasoning models.
- Your use case involves experimental orchestration of multiple tools and APIs across long multi-turn sessions.
Avoid if...
- You need ultra-low-latency real-time chat or streaming responses with minimal reasoning overhead.
- Your workload requires strict output brevity without verbose reasoning traces or intermediate thoughts.
- You need lightweight on-device inference on CPUs or small GPUs with tight memory limits.
- Your workload requires multimodal vision, image understanding, or audio capabilities not provided by Trinity.
- You need a simple, fast, general-purpose assistant where deep reasoning is unnecessary overkill.
- Your workload requires hardened, compliance-certified enterprise hosting where Arcee’s stack is unavailable.
- You need predictable, very low token usage budgets instead of long, exploratory reasoning outputs.
FAQ
Frequently Asked Questions
-
What is Trinity Large Thinking?
Trinity Large Thinking is a large language model by Arcee AI optimized for complex reasoning, analysis, and multi-step problem solving.
-
What is Trinity Large Thinking best suited for?
It is best for long-form reasoning tasks like code review, technical design discussions, research synthesis, and stepwise planning where depth of analysis matters.
-
What is the context window of Trinity Large Thinking?
Trinity Large Thinking supports a large context window suitable for multi-document prompts, but exact token limits depend on the LLM.API deployment configuration.
-
How fast is Trinity Large Thinking in terms of latency?
Latency varies by LLM.API region and load, but Trinity Large Thinking is tuned for interactive use rather than ultra-low-latency streaming workloads.
-
What modalities does Trinity Large Thinking support?
Trinity Large Thinking is a text-only model focused on natural language input and output, without native image or audio processing.
-
How do I access Trinity Large Thinking through LLM.API?
You can call Trinity Large Thinking via LLM.API by specifying the provider as Arcee AI and the model name as Trinity Large Thinking in requests.
-
How is Trinity Large Thinking priced on LLM.API?
Pricing for Trinity Large Thinking is usage-based on input and output tokens, with exact rates listed on the LLM.API pricing page for Arcee AI models.
-
How does Trinity Large Thinking compare to smaller Trinity models?
Compared to smaller Trinity variants, Trinity Large Thinking offers stronger reasoning and accuracy at higher compute cost and slightly increased latency.
-
What are the main limitations of Trinity Large Thinking?
It can still hallucinate facts, struggle with very recent information, and is not suitable for real-time control or safety-critical decision-making without oversight.
-
Can I use Trinity Large Thinking for tool calling or function calling via LLM.API?
Yes, as long as your LLM.API integration defines tools or functions, Trinity Large Thinking can follow structured tool-calling schemas in prompts.
