Powered by BAAI
bge-base-en-v1.5
bge-base-en-v1.5 is a base-sized English text embedding model from BAAI’s BGE (BAAI General Embedding) series, optimized for semantic similarity and retrieval. It generates 768-dimensional embeddings for tasks like search, clustering, and reranking.
About the model
What is bge-base-en-v1.5?
bge-base-en-v1.5 is an English language embedding model developed by BAAI as part of its BGE general embedding series, transforming text into 768-dimensional vectors optimized for semantic similarity. It is mainly used for information retrieval and semantic search, where both queries and documents are embedded into a shared vector space for relevance ranking. It is also applied in downstream tasks such as clustering, reranking, and recommendation systems that rely on dense text representations. It belongs to the FlagEmbedding/BGE family alongside related variants like bge-small-en-v1.5 and bge-large-en-v1.5.
Model capabilities
5 Core Capabilities
-
Text Embeddings
Converts English sentences and passages into 768-dimensional dense vectors capturing semantic meaning for downstream similarity-based applications.
-
Semantic Search
Supports semantic search by embedding queries and documents into a shared space, enabling retrieval by meaning rather than exact keywords.
-
Sentence Similarity
Measures similarity between English texts by comparing their embeddings, useful for clustering, deduplication, and paraphrase detection pipelines.
-
Document Retrieval
Optimized for text retrieval tasks, ranking relevant passages or documents for a given query using vector similarity scores.
-
RAG Integration
Acts as the embedding backbone in retrieval-augmented generation systems, efficiently indexing and retrieving knowledge for larger language models.
Use cases
6 Most Valuable Use Cases
- Semantic Document Search
- Question Answer Retrieval
- Text Clustering Analysis
- RAG Knowledge Base
- Recommendation Matching
- Duplicate Ticket Detection
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding prices and best performance for bge-base-en-v1.5-class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~80ms | ~8,000 tps | 99.99% | $0.0100 | $0.0100 | 8K tokens |
| BAAI | Global | ~140ms | ~4,000 tps | ~99.9% | ~$0.0130 | ~$0.0130 | 8K tokens |
| OpenAI | Global | ~160ms | ~3,000 tps | 99.9% | ~$0.0200 | ~$0.0200 | 8K tokens |
| Azure AI | US East | ~170ms | ~2,500 tps | 99.9% | ~$0.0220 | ~$0.0220 | 8K tokens |
| Replicate | Global | ~190ms | ~2,000 tps | ~99.5% | ~$0.0250 | ~$0.0250 | 8K tokens |
Performance benchmarks
Technical Specifications
| Metric | bge-base-en-v1.5 (BAAI) | all-MiniLM-L6-v2 (SBERT) | text-embedding-3-small (OpenAI) |
|---|---|---|---|
| Dimensions | 768 | 384 | 1536 |
| Max Input Tokens | ~512 | ~256 | 8K |
| Price per 1M Tokens | ~$0.05 | ~$0.00 | ~$0.02 |
| Avg Latency per 1K Tokens | ~80ms | ~60ms | ~90ms |
| Throughput | ~2.5K tps | ~3K tps | ~2K tps |
| Uptime | ~99.5% | ~99.0% | ~99.9% |
30-day usage via LLM API
- 620M
- Embedding tokens processed (30 days)
- 5.4M
- API requests (30 days)
- 41.5K
- Active developer accounts (30 days)
- 99.95%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the optimal provider and model based on latency, cost, or performance policies—without changing your application code.
One endpoint, every model -
Cost-Aware Orchestration
Control spend with per-route budgets, transparent usage metrics, and intelligent downshifting to cheaper models when quality thresholds are safely met.
Optimize spend by default -
Resilient Fallback Flows
Define multi-provider fallbacks that auto-trigger on errors, timeouts, or degraded responses so your critical AI paths keep working in production.
No single point of failure -
End-to-End Observability
Trace every request across models and providers with logs, metrics, and structured events to debug failures, tune prompts, and prove SLAs.
See every token hop -
Task-Level Abstractions
Codify tasks like chat, generation, ranking, and tools once, then swap models or providers behind the scenes without touching business logic.
Code to tasks, not models -
High-Throughput Batch APIs
Ship massive workloads through a single batch call with automatic chunking, retries, and concurrency control tuned for throughput and reliability.
Batch at production scale
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a strong English sentence-embedding model for general semantic similarity tasks.
- You need inexpensive, fast vectorization for large-scale retrieval or RAG pipelines.
- Your use case involves clustering or deduplicating many short English texts or titles.
- Your use case involves building semantic search over FAQs, documentation, or support tickets.
- You need a widely adopted open-source baseline embedding model with good community benchmarks.
- Your use case involves re-ranking small candidate sets using cosine similarity of embeddings.
Avoid if...
- You need multilingual embeddings beyond English, covering many languages with consistent performance.
- Your workload requires domain-specialized embeddings, like biomedical or legal text understanding.
- You need cross-modal embeddings aligning text with images, audio, or other modalities.
- You need extremely high-dimensional, state-of-the-art embeddings for nuanced reasoning-heavy tasks.
- Your workload requires very long-context document representation beyond what base models handle well.
- You need supervised task-specific models, such as direct question answering or classification.
FAQ
Frequently Asked Questions
-
What is bge-base-en-v1.5?
bge-base-en-v1.5 is a 768-dimensional English text embedding model from BAAI optimized for retrieval, semantic search, and text similarity tasks.
-
What is bge-base-en-v1.5 best suited for when used via LLM.API?
It is best suited for building vector search, dense retrieval, reranking pipelines, semantic clustering, and recommendation systems on English text.
-
What context window should I assume when using bge-base-en-v1.5 for embeddings?
bge-base-en-v1.5 is typically used with inputs up to around 512 tokens, so you should chunk longer documents before embedding.
-
What modalities does bge-base-en-v1.5 support?
bge-base-en-v1.5 supports only text-to-vector embeddings and does not handle images, audio, or code execution.
-
How is bge-base-en-v1.5 priced on LLM.API?
Pricing is usage-based per embedded token and may differ from BAAI’s own deployment, so check the LLM.API pricing page for current rates.
-
What latency should I expect from bge-base-en-v1.5 on LLM.API?
You can generally expect low, sub-second latency for short texts, depending on request batch size and your network conditions.
-
How do I call bge-base-en-v1.5 through LLM.API?
Specify the model name "bge-base-en-v1.5" in the embeddings endpoint of LLM.API and pass your English text as input.
-
How does bge-base-en-v1.5 compare to larger BGE models?
Compared to larger BGE variants, it offers smaller embeddings and faster inference at the cost of slightly lower retrieval accuracy.
-
Can I use bge-base-en-v1.5 for multilingual text?
It is primarily trained for English, so performance on non-English text will generally be weaker than on English inputs.
-
What limitations should I be aware of when using bge-base-en-v1.5?
It does not generate text, may lose information on very long inputs, and its embeddings can reflect biases present in training data.
