Powered by Sentence Transformers

all-mpnet-base-v2

  • Text Generation

all-mpnet-base-v2 is a widely used English sentence-embedding model from Sentence Transformers that maps text to 768-dimensional vectors for semantic similarity tasks. It is built on Microsoft’s MPNet architecture and fine-tuned on over a billion sentence pairs for strong general-purpose performance.

Start Using API

What is all-mpnet-base-v2?

all-mpnet-base-v2 is an English sentence-transformer model that encodes sentences and short paragraphs into 768-dimensional dense vector embeddings. It is mainly used for semantic search and retrieval in applications like RAG pipelines, documentation search, and information retrieval systems. It is also commonly applied to clustering, deduplication, and semantic similarity scoring across large text collections. The model is part of the Sentence Transformers family and is fine-tuned from the microsoft/mpnet-base architecture using large-scale contrastive training data.

5 Core Capabilities

  • Sentence Embeddings

    Generates dense vector embeddings for sentences and short texts, capturing semantic meaning for downstream similarity and retrieval tasks.

  • Semantic Search

    Enables semantic search by embedding queries and documents into a shared space, supporting meaning-based retrieval beyond exact keyword matching.

  • Text Clustering

    Supports clustering of documents or sentences by embedding them into vectors, enabling grouping of semantically similar texts at scale.

  • Text Classification

    Provides embeddings usable as features for training classifiers, improving performance on various downstream text classification tasks.

  • Duplicate Detection

    Identifies near-duplicate or paraphrased sentences by comparing embedding similarity, useful for deduplication and plagiarism-like detection scenarios.

6 Most Valuable Use Cases

  • Semantic Text Search
  • Duplicate Question Detection
  • Legal Case Similarity Search
  • Case Law Monitoring
  • Product Recommendation Matching
  • Embedding-Based NLP

Cost Comparison

LLM API embeddings are up to ~60% cheaper than comparable all-mpnet-base-v2 offerings.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120k tokens/s 99.99% $0.02 $0.00 8192 tokens
Sentence Transformers (Hosted) Global ~220ms ~300 tps ~99.5% ~$0.05 $0.00 ~4096 tokens
Hugging Face Inference API EU West ~250ms ~250 tps 99.9% ~$0.06 $0.00 ~4096 tokens
Azure AI (MPNet-like Embeddings) US East ~200ms ~400 tps 99.9% ~$0.04 $0.00 4096 tokens
Replicate US West ~260ms ~200 tps ~99.0% ~$0.07 $0.00 ~4096 tokens

Technical Specifications

Metric all-mpnet-base-v2 (Sentence Transformers) bert-base-nli-mean-tokens (Sentence Transformers) paraphrase-MiniLM-L6-v2 (Sentence Transformers)
Dimensions 768 768 384
Max Input Tokens ~256 tokens ~128 tokens ~256 tokens
Price per 1M Tokens ~$0.10 (self-hosted infra only) ~$0.09 (self-hosted infra only) ~$0.07 (self-hosted infra only)
Avg Latency (per 128‑token input on GPU) ~6ms ~8ms ~4ms
Throughput (embeddings/s on single GPU) ~4,000/s ~3,000/s ~6,000/s
Model Size ~420MB ~420MB ~90MB
Training Domain General English STS + NLI General English NLI General English paraphrase mining
Uptime (self-hosted, well-managed) ~99.5% ~99.5% ~99.5%

30-day usage via LLM API

3.8B
Text pairs embedded in last 30 days
21M
API requests served in last 30 days
410K
Developers using this model monthly
99.9%
Avg API uptime over last 30 days
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Automatically route each request to the best model across providers based on task, latency, and reliability—no client changes required as your stack evolves.

    One endpoint, any model
  • Cost-Aware Orchestration

    Optimize for price and performance with per-request cost controls, dynamic model selection, and transparent usage insights that keep your AI bill predictable.

    Cut cost, keep quality
  • Automatic Provider Fallback

    Survive provider outages and rate limits with built-in failover logic that retries on alternate models, preserving SLAs without custom recovery code.

    Resiliency by default
  • Full-Stack Observability

    Track latency, errors, tokens, and provider performance across every request with unified logs, traces, and metrics wired for your existing monitoring stack.

    See every token
  • Task-Level Abstractions

    Call high-level tasks—chat, tools, RAG, vision—instead of provider-specific APIs, so you can swap models without rewriting business logic or prompt glue.

    Code to tasks, not vendors
  • High-Throughput Batch Jobs

    Run massive batch workloads through a single endpoint with concurrency controls, retries, and progress tracking designed for production-scale pipelines.

    Ship bulk, stay fast

When to Use — When NOT to Use

Use it if...

  • You need robust general-purpose sentence embeddings for semantic similarity and clustering tasks.
  • You need to power semantic search over short to medium-length English texts efficiently.
  • Your use case involves intent classification or FAQ matching using dense vector similarity.
  • You need a well-known, widely-benchmarked baseline model for sentence-level embedding experiments.
  • Your use case involves building recommendation systems based on textual description similarity.
  • You need to deduplicate or cluster large corpora of short documents by semantic closeness.
  • Your use case involves zero-shot text matching by comparing query and label descriptions directly.

Avoid if...

  • You need to process very long documents end-to-end, far beyond typical sentence lengths.
  • Your workload requires state-of-the-art multilingual performance across many non-English languages.
  • You need embeddings specifically optimized for code, images, audio, or multimodal inputs.
  • Your workload requires continuously updated embeddings reflecting very recent domain-specific knowledge.
  • You need task-specific fine-tuning with integrated training pipelines rather than an off-the-shelf encoder.
  • Your workload requires strict on-device inference with extremely constrained memory and compute resources.
  • You need strong domain adaptation out-of-the-box for highly specialized technical or legal text.

Frequently Asked Questions

  • What is all-mpnet-base-v2?

    all-mpnet-base-v2 is a Sentence Transformers text-embedding model based on MPNet, optimized for high-quality general-purpose sentence and document similarity.

  • What is all-mpnet-base-v2 best used for?

    It is best for semantic search, clustering, deduplication, recommendation, and textual similarity tasks where short-to-medium English sentences or paragraphs are compared.

  • What modalities does all-mpnet-base-v2 support?

    all-mpnet-base-v2 is text-only and generates fixed-size vector embeddings from input text; it does not process images, audio, or other modalities.

  • What is the embedding dimensionality and context window of all-mpnet-base-v2?

    The model outputs 768-dimensional embeddings and is typically used with short to moderate-length texts up to roughly a few hundred tokens.

  • How fast is all-mpnet-base-v2 when called through LLM.API?

    Latency depends on input size and region, but LLM.API routes to optimized Sentence Transformers runtimes for low-latency, high-throughput embedding generation.

  • How is pricing for all-mpnet-base-v2 handled on LLM.API?

    Usage is billed according to LLM.API’s standard embedding pricing for this provider, usually per-token or per-character, as shown in your LLM.API dashboard.

  • How do I access all-mpnet-base-v2 via the LLM.API?

    Call the LLM.API embeddings endpoint with provider set to Sentence Transformers and model set to all-mpnet-base-v2, passing your texts in the request body.

  • How does all-mpnet-base-v2 compare to larger Sentence Transformers models?

    It is generally smaller and faster than larger Sentence Transformers models, offering strong performance for many tasks with reduced compute and latency.

  • Does all-mpnet-base-v2 support multilingual text?

    It mainly targets English and may work on related languages, but performance is not guaranteed or optimized for fully multilingual use cases.

  • What are the main limitations of all-mpnet-base-v2?

    It cannot generate or edit text, struggles with very long documents, and performance may degrade on domain-specific or non-English data without adaptation.

Start in 2 lines of code

Get My API Key