Parakeet TDT 0.6B v3

Speech-to-Text

Parakeet TDT 0.6B v3 is NVIDIA’s 600M-parameter multilingual automatic speech recognition (ASR) model built on the FastConformer-TDT architecture, optimized for high-throughput speech-to-text across European languages.

Start Using API

API Performance

Latency: ~0.3s time to first token
Context: 4K tokens
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is Parakeet TDT 0.6B v3?

Parakeet TDT 0.6B v3 is a 600-million-parameter multilingual speech-to-text model from NVIDIA based on the FastConformer-TDT architecture and trained on over 670,000 hours of audio from the Granary dataset. It is primarily used for real-time and batch transcription of audio and video, offering automatic language detection across roughly 25 European or EU languages and returning text with punctuation and timestamps. It is also adopted in cost-efficient pipelines and offline tools as an alternative to Whisper-class ASR for multilingual captioning, dictation, and media indexing. Parakeet TDT 0.6B v3 is part of NVIDIA’s Parakeet family and follows earlier Parakeet TDT 0.6B v2 and related multilingual ASR work.

Input / Output

Input

Audio files (FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WEBM)

Output

Transcribed text with punctuation and segment timestamps (JSON, plain text, SRT, VTT)

Model capabilities

5 Core Capabilities

Multilingual ASR

Performs automatic speech recognition across 25 European languages, converting spoken audio into accurate text transcripts.
Language Detection

Automatically identifies the spoken language in input audio among supported European languages before transcribing.
Long Audio Handling

Processes long-form recordings up to several hours using FastConformer local attention while maintaining throughput and stability.
Timestamped Transcripts

Generates transcripts with punctuation and segment-level timestamps suitable for indexing, search, and subtitle generation.
Cross-Language Transcription

Supports consistent transcription quality across diverse European languages, enabling unified multilingual speech-to-text workflows.

Use cases

6 Most Valuable Use Cases

Call Center Transcription
Voice Command Interfaces
Media Subtitle Generation
Meeting Notes Transcription
Customer Support Analytics
On-Device Dictation

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for Parakeet‑class TDT models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 tps	99.99%	$0.08 per 1M tokens	$0.08 per 1M tokens	128K tokens
NVIDIA (Parakeet TDT 0.6B v3 via NIM)	US West	~140ms	~60 tps	~99.9%	~$0.20 per 1M tokens	~$0.20 per 1M tokens	~32K tokens
AWS Bedrock (Parakeet‑equivalent small TDT)	US East	~160ms	~40 tps	99.9%	~$0.30 per 1M tokens	~$0.30 per 1M tokens	~32K tokens
Azure AI (Parakeet‑class small TDT)	EU West	~170ms	~35 tps	99.9%	~$0.32 per 1M tokens	~$0.32 per 1M tokens	~32K tokens
GCP Vertex AI (Parakeet‑class small TDT)	Global	~180ms	~30 tps	~99.9%	~$0.35 per 1M tokens	~$0.35 per 1M tokens	~32K tokens

Performance benchmarks

Technical Specifications

Metric	Parakeet TDT 0.6B v3	Parakeet TDT 1.1B v3	Parakeet TDT 0.6B
Model Type	Transducer / TDT ASR	Transducer / TDT ASR	Transducer ASR
Parameter Count	0.6B	1.1B	0.6B
Latency	—	—	—
Languages Supported	English	English	English
Price per Minute	—	—	—
Max Audio Duration	—	—	—
Accuracy (WER)	—	—	—
Uptime	—	—	—

30-day usage via LLM API

720M: Prompt tokens processed (30 days)
210M: Completion tokens generated (30 days)
2.4M: API requests served (30 days)
99.95%: Avg uptime over last 30 days

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, cost, or quality—without changing your integration or redeploying code.
One endpoint, every model.
Cost-Aware Orchestration

Balance quality and spend by mixing premium and budget models, enforcing per-project limits, and getting clear cost insights per call, team, and environment.
Cut spend, keep quality.
Resilient Fallback Flows

Define automatic fallbacks across models and providers so timeouts, rate limits, or regional outages degrade gracefully instead of breaking your production workloads.
No single point of failure.
End-to-End Observability

Trace every request across models with logs, metrics, and structured events to debug latency, drift, and errors directly from your AI gateway, not scattered dashboards.
See every token hop.
Task-Level Abstractions

Describe tasks like chat, tools, RAG, or moderation once and let LLM.API pick the best implementation details and models for each provider behind the scenes.
Think tasks, not APIs.
High-Throughput Batch

Process millions of inferences efficiently with provider-optimized batching, backoff, and concurrency controls that maximize throughput while staying within rate and budget limits.
Scale from 10 to millions.

Decision guide

When to Use — When NOT to Use

Use it if...

You need a compact speech model for low-resource or embedded transcription deployments.
Your use case involves experimenting with NVIDIA Riva or NeMo-compatible speech pipelines.
You need cost-effective batch transcription for short audio clips in controlled environments.
Your use case involves prototyping speech-to-text features without requiring top-tier accuracy.
You need a small-footprint model suitable for GPU sharing among multiple concurrent jobs.

Avoid if...

You need state-of-the-art transcription accuracy across many accents, domains, and noisy recordings.
Your workload requires robust long-form transcription with stable diarization and segmentation.
You need multilingual speech recognition with strong support for many non-English languages.
Your workload requires advanced reasoning, summarization, or multimodal understanding beyond plain transcription.
You need enterprise-grade SLAs for mission-critical, large-scale production speech applications.

FAQ

Frequently Asked Questions

What is Parakeet TDT 0.6B v3?

Parakeet TDT 0.6B v3 is a 0.6B-parameter NVIDIA speech model focused on fast, lightweight transcription and diarization tasks.
What is Parakeet TDT 0.6B v3 best suited for?

It is best for real-time or near–real-time speech-to-text, turn detection, and diarization in resource-constrained or high-throughput environments.
How is Parakeet TDT 0.6B v3 priced on LLM.API?

Pricing is usage-based per audio duration processed; check the Parakeet TDT 0.6B v3 entry on LLM.API’s pricing page for current rates.
What is the context window or maximum audio length Parakeet TDT 0.6B v3 supports?

LLM.API caps the maximum audio duration per request; refer to the model’s documentation for the latest per-call audio length limit.
How fast is Parakeet TDT 0.6B v3 in terms of latency?

Thanks to its small 0.6B size, it offers low latency and is suitable for streaming or interactive speech applications via LLM.API.
What modalities does Parakeet TDT 0.6B v3 support?

Parakeet TDT 0.6B v3 accepts audio input and produces text outputs, including speaker and turn information where applicable.
How do I call Parakeet TDT 0.6B v3 through LLM.API?

Use the LLM.API speech endpoint with the model identifier for Parakeet TDT 0.6B v3, passing audio data and configuration in the request body.
How does Parakeet TDT 0.6B v3 compare to larger NVIDIA speech models?

Compared to larger Parakeet variants, it trades some accuracy for significantly lower latency, memory usage, and compute cost.
Does Parakeet TDT 0.6B v3 support streaming transcription on LLM.API?

If enabled by LLM.API, you can use streaming mode for incremental transcripts; check the API reference for streaming availability details.
What are the main limitations of Parakeet TDT 0.6B v3?

Its smaller size may reduce accuracy on noisy, highly accented, or domain-specific speech compared with larger, more capable speech models.

Start in 2 lines of code

Get My API Key

Parakeet TDT 0.6B v3

What is Parakeet TDT 0.6B v3?

5 Core Capabilities

Multilingual ASR

Language Detection

Long Audio Handling

Timestamped Transcripts

Cross-Language Transcription

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code