Powered by Sentence Transformers
multi-qa-mpnet-base-dot-v1
- Text Generation
multi-qa-mpnet-base-dot-v1 is a Sentence Transformers model that encodes sentences and paragraphs into 768-dimensional embeddings optimized for semantic search using dot-product similarity. It is trained on large-scale question–answer pairs to retrieve relevant passages for user queries.
About the model
What is multi-qa-mpnet-base-dot-v1?
multi-qa-mpnet-base-dot-v1 is a sentence-transformers text-embedding model based on mpnet-base that maps natural-language inputs to 768-dimensional dense vectors. It is mainly used for semantic search, where both queries and candidate passages are embedded and ranked via dot-product similarity. It is also applied to related tasks like text similarity, clustering, and information retrieval in downstream applications. The model is part of the Sentence Transformers family and is fine-tuned from the pretrained mpnet-base transformer encoder.
Model capabilities
5 Core Capabilities
-
Semantic Text Search
Enables high-quality semantic search by encoding queries and documents into a shared vector space for similarity-based retrieval.
-
Question Answer Retrieval
Optimized for multi-domain question-answer retrieval, matching user questions to the most relevant passages or FAQ-style answers.
-
Sentence Embedding Generation
Produces dense sentence and passage embeddings capturing semantic meaning, suitable for clustering, ranking, and downstream NLP tasks.
-
Cross-lingual Question Matching
Supports multilingual question matching, allowing questions in different languages to be mapped into a common embedding space for retrieval.
-
Text Similarity Scoring
Computes cosine similarity between query and document embeddings to score relevance for information retrieval and recommendation pipelines.
Use cases
6 Most Valuable Use Cases
- Semantic Document Search
- Question Answer Retrieval
- FAQ Matching System
- Duplicate Issue Detection
- Recommendation via Similarity
- Embedding-Based Retrieval
Transparent pricing
Cost Comparison
LLM API embeddings for multi-qa-mpnet-base-dot-v1 are up to ~60% cheaper than major providers
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~9000 tokens/s | 99.99% | $0.02 per 1M tokens | $0.00 per 1M tokens | ~8K tokens |
| Sentence Transformers (Hosted API) | Global | ~250ms | ~3000 tokens/s | ~99.9% | ~$0.05 per 1M tokens | $0.00 per 1M tokens | ~8K tokens |
| Hugging Face Inference API | EU West | ~280ms | ~2500 tokens/s | ~99.9% | ~$0.06 per 1M tokens | $0.00 per 1M tokens | ~8K tokens |
| Azure AI (Custom Container) | US East | ~220ms | ~4000 tokens/s | 99.9% | ~$0.07 per 1M tokens | $0.00 per 1M tokens | ~8K tokens |
| Replicate | Global | ~300ms | ~2000 tokens/s | ~99.5% | ~$0.08 per 1M tokens | $0.00 per 1M tokens | ~8K tokens |
Performance benchmarks
Technical Specifications
| Metric | multi-qa-mpnet-base-dot-v1 | all-mpnet-base-v2 | msmarco-distilbert-base-v3 |
|---|---|---|---|
| Model Type | Text embedding (bi-encoder) | General-purpose text embedding | MS MARCO passage ranking bi-encoder |
| Dimensions | 768 | 768 | 768 |
| Max Input Tokens | ~512 | ~512 | ~512 |
| Price per 1M Tokens | ~$0.10 | ~$0.10 | ~$0.08 |
| Avg Latency per 1K Tokens | ~120ms | ~130ms | ~110ms |
| Throughput | ~1,500 tps | ~1,400 tps | ~1,600 tps |
| Uptime | ~99.9% | ~99.9% | ~99.9% |
30-day usage via LLM API
- 620M
- Embedding tokens processed (30 days)
- 7.8M
- API requests (30 days)
- 41.5K
- Active developer accounts (30 days)
- 99.95%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route requests across providers based on latency, cost, or quality policies. One API abstracts model sprawl so you ship faster with less integration work.
One endpoint, every model. -
Cost-Aware Orchestration
Automatically balance model choice and token usage to hit your budget targets. Set cost policies once and let LLM.API optimize every call for price-performance.
Lower spend, same output. -
Resilient Fallbacks
Define per-route fallback chains so outages or rate limits never break production. LLM.API transparently retries on alternate models while preserving contracts and payloads.
Stay online, automatically. -
End-to-End Observability
Trace every request across providers with unified logs, metrics, and evaluations. Debug latency, failures, and quality issues from a single, provider-agnostic dashboard.
See every token hop. -
Task-Level Abstractions
Declare tasks like chat, embeddings, tools, or rerank once—LLM.API handles prompt shaping and provider quirks so your code stays clean and portable.
Code to tasks, not vendors. -
High-Throughput Batch
Send thousands of requests in a single batch with built-in concurrency control and retries. Maximize throughput while keeping provider limits and costs in check.
Scale workloads, not code.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need high-quality semantic search over short to medium-length English text passages.
- You need to power FAQ-style question–answer retrieval using dense vector similarity search.
- Your use case involves clustering or deduplicating similar sentences, queries, or support tickets.
- You need multilingual-friendly question embeddings compatible with many existing vector databases.
- Your use case involves reranking BM25 search results with lightweight semantic similarity scoring.
- You need an open-source, locally deployable embedding model without external API dependencies.
Avoid if...
- You need to process very long documents end-to-end, beyond a few paragraphs per embedding.
- Your workload requires generating text, summarization, translation, or other generative capabilities.
- You need domain-specific embeddings for code, biology, or finance with strong specialized performance.
- Your workload requires state-of-the-art cross-lingual performance across many low-resource languages.
- You need real-time personalization or training on-the-fly, beyond fixed pretrained embeddings.
- Your workload requires detailed reasoning, logical inference, or multi-step planning over retrieved content.
FAQ
Frequently Asked Questions
-
What is multi-qa-mpnet-base-dot-v1?
multi-qa-mpnet-base-dot-v1 is a Sentence Transformers model for generating dense embeddings tailored to multi-domain question answering and semantic search.
-
What is multi-qa-mpnet-base-dot-v1 best used for?
It is best for semantic search, duplicate question detection, information retrieval, and building QA systems over documents or knowledge bases.
-
How much does it cost to use multi-qa-mpnet-base-dot-v1 via LLM.API?
LLM.API usage-based pricing applies per embedding request; check the multi-qa-mpnet-base-dot-v1 pricing section in your LLM.API dashboard.
-
What context window or maximum input length does multi-qa-mpnet-base-dot-v1 support?
multi-qa-mpnet-base-dot-v1 is typically used with inputs up to a few hundred tokens; very long texts should be chunked before embedding.
-
How fast is multi-qa-mpnet-base-dot-v1 on LLM.API?
On LLM.API, it usually returns embeddings in tens of milliseconds to low hundreds per request, depending on batch size and load.
-
What modalities does multi-qa-mpnet-base-dot-v1 support?
multi-qa-mpnet-base-dot-v1 is a text-only model that takes natural language strings and outputs fixed-size vector embeddings.
-
How do I call multi-qa-mpnet-base-dot-v1 through LLM.API?
Use the LLM.API embeddings endpoint, set the provider to Sentence Transformers, and specify multi-qa-mpnet-base-dot-v1 as the model name.
-
How does multi-qa-mpnet-base-dot-v1 compare to other Sentence Transformers QA models?
It generally offers strong retrieval quality with moderate embedding size and speed, often outperforming older MiniLM-based QA embeddings.
-
Can I use multi-qa-mpnet-base-dot-v1 for multilingual queries?
It works best for English; performance on other languages may be weaker and is not specifically optimized for broad multilingual coverage.
-
What are the main limitations of multi-qa-mpnet-base-dot-v1?
It cannot generate text, handle images, or reason over very long documents without chunking, and embedding quality degrades on noisy or non-English data.
