Powered by Intfloat
E5-Base-v2
E5-Base-v2 is an English sentence and paragraph embedding model from Intfloat that encodes text into a 768-dimensional dense vector space, optimized for high-quality semantic similarity tasks.
About the model
What is E5-Base-v2?
E5-Base-v2 is a transformer-based text embedding model that maps English sentences and paragraphs to 768-dimensional vectors for semantic representation. It is mainly used for semantic search, document and passage retrieval, similarity scoring, and clustering in information retrieval systems. It also serves as a backbone encoder for downstream tasks such as reranking, retrieval-augmented generation, and domain-specific search applications. E5-Base-v2 belongs to Intfloat’s E5 family of embedding models, which includes earlier E5 variants and larger v2 models like E5-Large-v2.
Model capabilities
5 Core Capabilities
-
Text Embedding
Encodes English sentences and paragraphs into 768-dimensional dense vectors for downstream machine learning and NLP applications.
-
Semantic Search
Generates embeddings optimized for semantic search, enabling retrieval of relevant documents based on meaning rather than keywords.
-
Sentence Similarity
Produces high-quality embeddings suitable for computing semantic similarity scores between sentences, queries, and documents.
-
Text Clustering
Supports grouping related texts by embedding them in a shared vector space, facilitating unsupervised clustering and topic exploration.
-
Vector Retrieval
Integrates into retrieval pipelines as a dense retriever model, powering vector databases and hybrid search systems.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- Document Retrieval RAG
- Text Similarity Scoring
- Content Clustering
- Topic Classification
- Change / Trend Monitoring
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding costs and latency for E5-Base-v2–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 6000 tps | 99.99% | $0.05 | $0.00 | 4096 tokens |
| Intfloat | Global | ~220ms | ~2500 tps | ~99.9% | ~$0.10 | ~$0.00 | 4096 tokens |
| OpenAI (text-embedding-3-small) | Global | ~250ms | ~3000 tps | 99.9% | $0.02 | $0.00 | 8192 tokens |
| Azure OpenAI (embedding) | US East | ~260ms | ~2800 tps | 99.9% | ~$0.025 | $0.00 | 8192 tokens |
| AWS Bedrock (Cohere Embed) | US West | ~300ms | ~2200 tps | 99.9% | ~$0.03 | $0.00 | ~4096 tokens |
Performance benchmarks
Technical Specifications
| Metric | E5-Base-v2 (Intfloat) | text-embedding-3-small (OpenAI) | all-MiniLM-L6-v2 (SentenceTransformers) |
|---|---|---|---|
| Dimensions | 768 | 1536 | 384 |
| Max Input Tokens | ~512 | 8K | ~256 |
| Price per 1M Tokens | ~$0.10 | $0.02 | ~$0.05 |
| Throughput | ~1.5K tps | ~5K tps | ~2K tps |
| Avg Latency | ~80ms | ~50ms | ~60ms |
| Uptime | ~99.5% | ~99.9% | ~99.0% |
30-day usage via LLM API
- 6.8B
- Prompt tokens processed (30 days)
- 12.5M
- API requests served (30 days)
- 310K
- Unique developers & teams (30 days)
- 99.8%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the best model across providers based on latency, cost, or quality. One integration, continuously optimized decisions.
One endpoint, all models. -
Cost-Aware Orchestration
Control and minimize spend with per-model pricing visibility, routing policies, and automatic fallbacks to cheaper equivalents without code changes.
Reduce AI cost at scale. -
Resilient Fallback Logic
Keep your app online with built-in retries and cross-provider failover when a model or region degrades, no custom reliability code required.
Never ship single-vendor risk. -
End-to-End Observability
Trace every call with latency, errors, tokens, and provider breakdowns in one place. Debug, optimize, and compare models with production-grade telemetry.
See every token, everywhere. -
Task-Level Abstractions
Describe what you need—chat, generation, extraction, tools—and let LLM.API select and tune the right models and prompts for each task.
Think tasks, not providers. -
High-Throughput Batch
Submit thousands of requests in a single batch API call with smart chunking, parallelism, and retries to saturate provider capacity safely.
Batch at production scale.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a strong general-purpose text embedding model for semantic search applications.
- You need multilingual sentence embeddings covering many languages with a single compact model.
- Your use case involves building retrieval-augmented generation systems needing high-quality dense retrieval.
- You need cost-efficient embeddings for large-scale document indexing and similarity search.
- Your use case involves clustering or topic modeling on short to medium-length text passages.
- You need to power recommendation or matching systems based on semantic text similarity.
- Your use case involves reranking candidate documents using cosine similarity between embeddings.
Avoid if...
- You need a generative language model for text completion, dialogue, or content creation.
- Your workload requires reasoning over very long documents beyond typical embedding input limits.
- You need domain-specialized embeddings, like code or biomedical text, with state-of-the-art performance.
- Your workload requires real-time token-by-token streaming or interactive conversational behavior.
- You need structured outputs such as JSON, SQL queries, or function-call arguments directly.
- Your workload requires fine-grained token-level tasks like sequence labeling or tagging.
- You need cross-modal embeddings aligning text with images, audio, or video representations.
FAQ
Frequently Asked Questions
-
What is E5-Base-v2?
E5-Base-v2 is an Intfloat text-embedding model designed for high-quality semantic search, retrieval, and clustering tasks.
-
What tasks is E5-Base-v2 best suited for?
E5-Base-v2 is best for dense retrieval, semantic similarity, reranking, and building vector search over documents, queries, and short passages.
-
How is E5-Base-v2 priced when used through LLM.API?
E5-Base-v2 usage on LLM.API is billed per input token or character, following LLM.API’s unified metered pricing for embedding models.
-
What is the context window of E5-Base-v2?
E5-Base-v2 typically supports input texts up to a few thousand tokens, after which inputs should be chunked before embedding.
-
How fast is E5-Base-v2 in terms of latency?
E5-Base-v2 generally provides low-latency embedding generation suitable for real-time or near-real-time search applications, depending on request size and concurrency.
-
What modalities does E5-Base-v2 support?
E5-Base-v2 is a text-only model that converts text inputs into dense vector embeddings.
-
How do I call E5-Base-v2 via the LLM.API gateway?
You can select the E5-Base-v2 model name in LLM.API’s embeddings endpoint, passing your text inputs and receiving embedding vectors in the response.
-
How does E5-Base-v2 compare to larger embedding models?
E5-Base-v2 generally offers a strong quality–performance tradeoff, with smaller size and lower cost than many larger embedding models while maintaining competitive retrieval quality.
-
What are the main limitations of E5-Base-v2?
E5-Base-v2 may struggle with very long documents, highly specialized domains, and tasks requiring generative output or multimodal understanding.
-
Can I use E5-Base-v2 for multilingual embeddings?
E5-Base-v2 is primarily optimized for English, so performance on other languages may be less reliable compared with dedicated multilingual embedding models.
