multi-qa-mpnet-base-dot-v1

Text Generation

multi-qa-mpnet-base-dot-v1 is a Sentence Transformers model that encodes sentences and paragraphs into 768-dimensional embeddings optimized for semantic search using dot-product similarity. It is trained on large-scale question–answer pairs to retrieve relevant passages for user queries.

Start Using API

API Performance

Latency: ~0.3s avg embedding time for short text
Context: 512 max tokens per input
Input: Free per 1M tokens (self-hosted open-source)
Output: Free per 1M tokens (embeddings only)
Uptime: 99% 99%

About the model

What is multi-qa-mpnet-base-dot-v1?

multi-qa-mpnet-base-dot-v1 is a sentence-transformers text-embedding model based on mpnet-base that maps natural-language inputs to 768-dimensional dense vectors. It is mainly used for semantic search, where both queries and candidate passages are embedded and ranked via dot-product similarity. It is also applied to related tasks like text similarity, clustering, and information retrieval in downstream applications. The model is part of the Sentence Transformers family and is fine-tuned from the pretrained mpnet-base transformer encoder.

Input / Output

Input

Text sentences, questions, or short paragraphs (up to 512 wordpieces)

Output

Numerical sentence embeddings (768-dimensional vectors)

Model capabilities

5 Core Capabilities

Semantic Text Search

Enables high-quality semantic search by encoding queries and documents into a shared vector space for similarity-based retrieval.
Question Answer Retrieval

Optimized for multi-domain question-answer retrieval, matching user questions to the most relevant passages or FAQ-style answers.
Sentence Embedding Generation

Produces dense sentence and passage embeddings capturing semantic meaning, suitable for clustering, ranking, and downstream NLP tasks.
Cross-lingual Question Matching

Supports multilingual question matching, allowing questions in different languages to be mapped into a common embedding space for retrieval.
Text Similarity Scoring

Computes cosine similarity between query and document embeddings to score relevance for information retrieval and recommendation pipelines.

Use cases

6 Most Valuable Use Cases

Semantic Document Search
Question Answer Retrieval
FAQ Matching System
Duplicate Issue Detection
Recommendation via Similarity
Embedding-Based Retrieval

Transparent pricing

Cost Comparison

LLM API embeddings for multi-qa-mpnet-base-dot-v1 are up to ~60% cheaper than major providers

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~9000 tokens/s	99.99%	$0.02 per 1M tokens	$0.00 per 1M tokens	~8K tokens
Sentence Transformers (Hosted API)	Global	~250ms	~3000 tokens/s	~99.9%	~$0.05 per 1M tokens	$0.00 per 1M tokens	~8K tokens
Hugging Face Inference API	EU West	~280ms	~2500 tokens/s	~99.9%	~$0.06 per 1M tokens	$0.00 per 1M tokens	~8K tokens
Azure AI (Custom Container)	US East	~220ms	~4000 tokens/s	99.9%	~$0.07 per 1M tokens	$0.00 per 1M tokens	~8K tokens
Replicate	Global	~300ms	~2000 tokens/s	~99.5%	~$0.08 per 1M tokens	$0.00 per 1M tokens	~8K tokens

Performance benchmarks

Technical Specifications

Metric	multi-qa-mpnet-base-dot-v1	all-mpnet-base-v2	msmarco-distilbert-base-v3
Model Type	Text embedding (bi-encoder)	General-purpose text embedding	MS MARCO passage ranking bi-encoder
Dimensions	768	768	768
Max Input Tokens	~512	~512	~512
Price per 1M Tokens	~$0.10	~$0.10	~$0.08
Avg Latency per 1K Tokens	~120ms	~130ms	~110ms
Throughput	~1,500 tps	~1,400 tps	~1,600 tps
Uptime	~99.9%	~99.9%	~99.9%

30-day usage via LLM API

620M: Embedding tokens processed (30 days)
7.8M: API requests (30 days)
41.5K: Active developer accounts (30 days)
99.95%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route requests across providers based on latency, cost, or quality policies. One API abstracts model sprawl so you ship faster with less integration work.
One endpoint, every model.
Cost-Aware Orchestration

Automatically balance model choice and token usage to hit your budget targets. Set cost policies once and let LLM.API optimize every call for price-performance.
Lower spend, same output.
Resilient Fallbacks

Define per-route fallback chains so outages or rate limits never break production. LLM.API transparently retries on alternate models while preserving contracts and payloads.
Stay online, automatically.
End-to-End Observability

Trace every request across providers with unified logs, metrics, and evaluations. Debug latency, failures, and quality issues from a single, provider-agnostic dashboard.
See every token hop.
Task-Level Abstractions

Declare tasks like chat, embeddings, tools, or rerank once—LLM.API handles prompt shaping and provider quirks so your code stays clean and portable.
Code to tasks, not vendors.
High-Throughput Batch

Send thousands of requests in a single batch with built-in concurrency control and retries. Maximize throughput while keeping provider limits and costs in check.
Scale workloads, not code.

Decision guide

When to Use — When NOT to Use

Use it if...

You need high-quality semantic search over short to medium-length English text passages.
You need to power FAQ-style question–answer retrieval using dense vector similarity search.
Your use case involves clustering or deduplicating similar sentences, queries, or support tickets.
You need multilingual-friendly question embeddings compatible with many existing vector databases.
Your use case involves reranking BM25 search results with lightweight semantic similarity scoring.
You need an open-source, locally deployable embedding model without external API dependencies.

Avoid if...

You need to process very long documents end-to-end, beyond a few paragraphs per embedding.
Your workload requires generating text, summarization, translation, or other generative capabilities.
You need domain-specific embeddings for code, biology, or finance with strong specialized performance.
Your workload requires state-of-the-art cross-lingual performance across many low-resource languages.
You need real-time personalization or training on-the-fly, beyond fixed pretrained embeddings.
Your workload requires detailed reasoning, logical inference, or multi-step planning over retrieved content.

FAQ

Frequently Asked Questions

What is multi-qa-mpnet-base-dot-v1?

multi-qa-mpnet-base-dot-v1 is a Sentence Transformers model for generating dense embeddings tailored to multi-domain question answering and semantic search.
What is multi-qa-mpnet-base-dot-v1 best used for?

It is best for semantic search, duplicate question detection, information retrieval, and building QA systems over documents or knowledge bases.
How much does it cost to use multi-qa-mpnet-base-dot-v1 via LLM.API?

LLM.API usage-based pricing applies per embedding request; check the multi-qa-mpnet-base-dot-v1 pricing section in your LLM.API dashboard.
What context window or maximum input length does multi-qa-mpnet-base-dot-v1 support?

multi-qa-mpnet-base-dot-v1 is typically used with inputs up to a few hundred tokens; very long texts should be chunked before embedding.
How fast is multi-qa-mpnet-base-dot-v1 on LLM.API?

On LLM.API, it usually returns embeddings in tens of milliseconds to low hundreds per request, depending on batch size and load.
What modalities does multi-qa-mpnet-base-dot-v1 support?

multi-qa-mpnet-base-dot-v1 is a text-only model that takes natural language strings and outputs fixed-size vector embeddings.
How do I call multi-qa-mpnet-base-dot-v1 through LLM.API?

Use the LLM.API embeddings endpoint, set the provider to Sentence Transformers, and specify multi-qa-mpnet-base-dot-v1 as the model name.
How does multi-qa-mpnet-base-dot-v1 compare to other Sentence Transformers QA models?

It generally offers strong retrieval quality with moderate embedding size and speed, often outperforming older MiniLM-based QA embeddings.
Can I use multi-qa-mpnet-base-dot-v1 for multilingual queries?

It works best for English; performance on other languages may be weaker and is not specifically optimized for broad multilingual coverage.
What are the main limitations of multi-qa-mpnet-base-dot-v1?

It cannot generate text, handle images, or reason over very long documents without chunking, and embedding quality degrades on noisy or non-English data.

Start in 2 lines of code

Get My API Key

multi-qa-mpnet-base-dot-v1

What is multi-qa-mpnet-base-dot-v1?

5 Core Capabilities

Semantic Text Search

Question Answer Retrieval

Sentence Embedding Generation

Cross-lingual Question Matching

Text Similarity Scoring

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallbacks

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code