Powered by BAAI
bge-large-en-v1.5
bge-large-en-v1.5 is a large English text embedding model from BAAI’s BGE (BAAI General Embedding) family that maps text into 1,024-dimensional dense vectors, optimized for semantic search and retrieval.
About the model
What is bge-large-en-v1.5?
bge-large-en-v1.5 is a 335M-parameter English sentence embedding model from BAAI that converts text into 1,024-dimensional vectors for similarity-based applications. It is mainly used for dense retrieval in retrieval-augmented generation (RAG) systems, semantic search, and document or passage ranking. The model is also applied to clustering, recommendation, and other tasks that rely on high-quality text similarity representations. It belongs to the BGE (BAAI General Embedding) series, a family that includes earlier English and Chinese variants and later multilingual successors such as bge-m3.
Model capabilities
5 Core Capabilities
-
Text Embeddings
Generates high-quality 1024-dimensional English text embeddings for sentences, paragraphs, and documents using an encoder-only architecture.
-
Semantic Search
Supports high-precision semantic search and retrieval by mapping related English texts to nearby vectors in embedding space.
-
Document Retrieval
Enables retrieval-augmented generation and knowledge base lookup by encoding long English documents into dense representations.
-
Similarity Matching
Performs sentence and document similarity scoring, clustering, and reranking based on distances between embedding vectors.
-
English Only
Specialized for English language inputs, providing optimized performance for monolingual English NLP embedding tasks.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- RAG Knowledge Retrieval
- Legal Case Retrieval
- Monitoring Similar Documents
- Product Recommendation Matching
- Clustering Text Embeddings
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding prices and best SLAs for bge-large-en-v1.5–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 6000 tps | 99.99% | $0.02 | $0.00 | 8K tokens |
| BAAI | Global | ~150ms | ~2000 tps | ~99.9% | $0.04 | $0.00 | 8K tokens |
| OpenAI | Global | ~180ms | ~3000 tps | 99.9% | $0.10 | $0.00 | 8K tokens |
| Azure AI | US East | ~200ms | ~2500 tps | 99.9% | $0.09 | $0.00 | 8K tokens |
| AWS Bedrock | US West | ~190ms | ~2200 tps | 99.9% | $0.08 | $0.00 | 8K tokens |
Performance benchmarks
Technical Specifications
| Metric | bge-large-en-v1.5 (BAAI) | text-embedding-3-large (OpenAI) | e5-large-v2 (intfloat) |
|---|---|---|---|
| Dimensions | 1024 | 3072 | 1024 |
| Max Input Tokens | 8K | 8K | 4K |
| Price per 1M Tokens | ~$0.10 | $0.13 | ~$0.10 |
| Avg Latency | ~120ms | ~180ms | ~140ms |
| Throughput | ~1.2K tps | ~1K tps | ~900 tps |
| Uptime | ~99.5% | ~99.9% | ~99.0% |
30-day usage via LLM API
- 2.8B
- Prompt tokens processed (30 days)
- 9.4M
- API requests served (30 days)
- 210K
- Monthly active developers
- 99.95%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Define policies once and let LLM.API route each request to the optimal model across providers based on latency, cost, and quality—no client changes required.
One policy, many models -
Cost-Aware Execution
Control spend with per-project price caps, smart model selection, and detailed usage insights so you can scale traffic without surprise bills or manual tuning.
Optimize spend by default -
Resilient Fallback Flows
Automatically fail over to backup models and regions on errors or timeouts, preserving SLAs and user experience without adding complex retry logic in your code.
Stay online, automatically -
End-to-End Observability
Trace every request across models, providers, and regions with structured logs, metrics, and latency breakdowns to debug issues and tune performance in production.
See every token hop -
Task-Level Abstractions
Describe tasks like chat, tools, RAG, or scoring once and let LLM.API normalize prompts, parameters, and outputs across incompatible providers and model formats.
Task-first, not model-first -
High-Throughput Batch Jobs
Run massive batch workloads through a single API with automatic chunking, concurrency limits, retries, and progress tracking, without maintaining custom pipelines.
Ship at batch scale
Decision guide
When to Use — When NOT to Use
Use it if...
- You need strong English sentence embeddings for semantic search and retrieval-augmented generation.
- You need dense vector representations to power similarity search over large text corpora.
- Your use case involves clustering or deduplicating English documents based on semantic similarity.
- Your use case involves reranking candidate search results using high-quality embedding similarity scores.
- You need a well-known open-source English embedding model compatible with many vector databases.
- Your use case involves intent matching between user queries and short English text descriptions.
- You need embeddings for question-answer retrieval across FAQ pages or knowledge-base articles.
Avoid if...
- You need a generative model capable of producing text, code, or structured outputs directly.
- Your workload requires multilingual or non-English embeddings with strong performance across many languages.
- You need ultra-long context understanding for very large documents in a single pass.
- Your workload requires strict on-device or mobile deployment with very limited memory footprint.
- You need task-specific fine-tuned embeddings for domains like code, biology, or legal text.
- You need real-time personalization where embeddings must frequently update during a single session.
- Your workload requires built-in safety classification or content filtering instead of separate moderation models.
FAQ
Frequently Asked Questions
-
What is bge-large-en-v1.5?
bge-large-en-v1.5 is an English sentence-embedding model by BAAI optimized for high-quality semantic similarity, retrieval, and reranking tasks.
-
What is bge-large-en-v1.5 best used for?
It is best for dense retrieval, semantic search, question-answer retrieval, and clustering English text by meaning rather than exact keywords.
-
What is the embedding dimension of bge-large-en-v1.5?
bge-large-en-v1.5 outputs 1,024-dimensional embeddings for each input text chunk.
-
What context length does bge-large-en-v1.5 effectively support?
It is typically used on short to medium English texts, and long documents should be chunked before embedding for best performance.
-
How fast is bge-large-en-v1.5 in terms of latency?
Latency depends on hardware and request batch size, but as a large embedding model it is slower than small embedding models per request.
-
What modalities does bge-large-en-v1.5 support?
bge-large-en-v1.5 is a text-only model that converts English text into dense vector embeddings.
-
How do I access bge-large-en-v1.5 through LLM.API?
Use the LLM.API embeddings endpoint, specifying provider "BAAI" and model "bge-large-en-v1.5" in your request parameters.
-
How is pricing for bge-large-en-v1.5 handled on LLM.API?
Pricing is metered per token or character for embedding requests, and the exact rate is defined by LLM.API’s BAAI pricing schedule.
-
How does bge-large-en-v1.5 compare to smaller embedding models?
It generally offers higher retrieval accuracy and semantic quality than smaller embedding models at the cost of higher compute and latency.
-
Can bge-large-en-v1.5 handle multilingual input?
It is primarily optimized for English; embeddings for other languages may be lower quality and are not the main target use case.
-
What are the main limitations of bge-large-en-v1.5?
It does not generate text, only embeddings, and may underperform on very long documents or non-English content without careful preprocessing.
-
Can I use bge-large-en-v1.5 for real-time applications?
Yes, but you should benchmark latency on your infrastructure and consider batching or caching to meet strict real-time requirements.
