Nemotron 3 Super (free)

Text Generation

Nemotron 3 Super (free) is NVIDIA’s open‑weights, high‑throughput 120B-parameter hybrid mixture‑of‑experts language model, optimized for complex agentic AI and multi‑agent reasoning workloads. It is notable for combining a hybrid Mamba‑Transformer architecture, LatentMoE sparsity, and a 1M‑token context window to deliver efficient long‑horizon reasoning.

Start Using API

API Performance

Latency: ~1.0s avg response
Context: ~8K token context
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is Nemotron 3 Super (free)?

Nemotron 3 Super is an open, 120B-parameter hybrid Mamba-Transformer mixture-of-experts model from NVIDIA designed for high-accuracy, efficient agentic reasoning. It is mainly used to power multi-agent and enterprise AI workflows that require long-context reasoning, planning, and orchestration across many tools or services. It is also well-suited for code, math, and complex multistep generation tasks where high throughput and long sequences are important. It belongs to the Nemotron 3 family of open models (Nano, Super, Ultra), succeeding earlier Nemotron generations.

Input / Output

Input

Text prompts (natural language, code, or structured text tokens)

Output

Text responses (natural language or structured chat-style output)
Generated or completed source code in text form

Model capabilities

5 Core Capabilities

Agentic Reasoning

Supports multi‑agent, tool-using AI workflows, coordinating complex tasks with high throughput and long-horizon reasoning across agents.
Long-Context Processing

Handles sequences up to around one million tokens, enabling analysis of large documents, codebases, and extended conversations without losing context.
Multilingual Text

Generates and understands text in multiple languages, including English and Japanese, for global applications and cross-lingual workflows.
General Chat

Engages in open-domain dialogue, following instructions, answering questions, and assisting with writing or brainstorming in natural language.
Code and Data Text

Trained on diverse web, code, and technical data, enabling structured outputs, explanations, and reasoning over text-based information sources.

Use cases

6 Most Valuable Use Cases

Code Generation Assistance
Customer Support Chatbots
Document Summarization
Semantic Text Tagging
Legal Case Research
Regulatory Case Monitoring

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance for Nemotron-class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 tps	99.99%	$0.10	$0.10	128K tokens
NVIDIA	Global	~180ms	~60 tps	~99.9%	$0.00	$0.00	~128K tokens
AWS Bedrock	US East	~220ms	~45 tps	~99.9%	~$0.25	~$0.25	~128K tokens
Google Cloud	Global	~210ms	~50 tps	~99.9%	~$0.24	~$0.24	~128K tokens
Azure AI	EU West	~230ms	~40 tps	~99.9%	~$0.26	~$0.26	~128K tokens

Performance benchmarks

Technical Specifications

Metric	Nemotron 3 Super (free)	Llama 3 8B Instruct (free)	Mistral 7B Instruct (free)
Avg Latency	~800ms	~900ms	~850ms
Context Window	8K	8K	8K
Input Price ($/1M)	$0.00	$0.00	$0.00
Output Price ($/1M)	$0.00	$0.00	$0.00
Max Output Tokens	2K	2K	2K
Throughput	~30 tps	~25 tps	~25 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

9.8B: Prompt tokens processed (last 30 days)
3.1B: Completion tokens generated (last 30 days)
12.5M: API requests served (last 30 days)
99.9%: Average uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Intelligently route each request across providers and models based on latency, capability, or custom rules. One API, always the best path for your workload.
Smart traffic, single endpoint
Cost-Aware Orchestration

Automatically balance performance and price with configurable policies. Use premium models when it matters, fall back to cheaper ones when it doesn’t.
Optimize spend by default
Resilient Fallbacks

Define multi-step failover chains across providers so requests keep flowing through outages, rate limits, or model errors—without touching your application code.
Stay online under stress
Deep Observability

Get full visibility into requests, tokens, latency, errors, and providers with structured logs and traces. Debug faster and tune workloads with real data.
See every token spent
Task-Level Abstractions

Describe tasks like chat, tools, reranking, or extraction once and run them on any model. Ship features without rewriting prompts per provider.
Code to tasks, not models
High-Throughput Batch

Submit massive batch jobs through a single API with queuing, retries, and cost controls built-in. Process millions of inputs without custom infrastructure.
Scale jobs, not ops

Decision guide

When to Use — When NOT to Use

Use it if...

You need a free, general-purpose model for everyday coding, writing, and Q&A.
You need to prototype AI features without incurring usage costs during experimentation.
Your use case involves moderate-length chats where perfect reasoning is not critical.
Your use case involves simple code snippets, bug fixes, or small refactors.
You need a baseline model to compare against stronger proprietary or paid systems.
Your use case involves occasional content generation, summaries, and simple data extraction.

Avoid if...

You need state-of-the-art reasoning quality for complex multi-step or high-stakes decisions.
Your workload requires very long-context processing, such as full-book analysis or logs.
You need top-tier code generation for large projects, architectures, or unfamiliar stacks.
Your workload requires highly reliable factual answers on niche, technical, or evolving topics.
You need best-in-class safety controls, compliance certifications, or robust content-filter customization.
Your workload requires highly optimized latency and throughput for large-scale, performance-critical production.

FAQ

Frequently Asked Questions

What is Nemotron 3 Super (free)?

Nemotron 3 Super (free) is an NVIDIA large language model accessible via LLM.API, tuned for general-purpose text generation and assistant-style conversations.
What is Nemotron 3 Super (free) best suited for?

It is best for fast, low-cost chat-style interactions, drafting content, and lightweight reasoning where cost and accessibility matter more than cutting-edge intelligence.
How is Nemotron 3 Super (free) priced on LLM.API?

The free tier incurs no direct per-token charges to you, but may be subject to rate limits and usage caps enforced by LLM.API.
What context window does Nemotron 3 Super (free) support?

Nemotron 3 Super (free) supports a context window of up to 8K tokens, including both prompt and response tokens.
How fast is Nemotron 3 Super (free) in terms of latency?

Latency is typically low for short prompts, but can increase under heavy shared-load conditions because the free tier runs on pooled infrastructure.
Which modalities does Nemotron 3 Super (free) support?

Nemotron 3 Super (free) supports text-in, text-out interactions only and does not natively process images, audio, or video.
How do I call Nemotron 3 Super (free) through the LLM.API gateway?

You select the model by its identifier in the LLM.API completion or chat endpoint, passing your prompt and standard configuration parameters like temperature.
How does Nemotron 3 Super (free) compare to larger NVIDIA or frontier models?

Compared to larger or paid frontier models, it is generally cheaper and more accessible but weaker on complex reasoning, coding, and long-context tasks.
What are the main limitations of Nemotron 3 Super (free)?

It may hallucinate facts, struggle with very long or deeply technical tasks, and lacks multimodal capabilities and fine-grained enterprise controls.
Can I use Nemotron 3 Super (free) for production workloads?

You can, but should account for potential rate limits, variable performance, and weaker reliability than dedicated, paid production-grade NVIDIA deployments.

Start in 2 lines of code

Get My API Key

Nemotron 3 Super (free)

What is Nemotron 3 Super (free)?

5 Core Capabilities

Agentic Reasoning

Long-Context Processing

Multilingual Text

General Chat

Code and Data Text

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallbacks

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code