bge-base-en-v1.5

Text Embeddings

bge-base-en-v1.5 is a base-sized English text embedding model from BAAI’s BGE (BAAI General Embedding) series, optimized for semantic similarity and retrieval. It generates 768-dimensional embeddings for tasks like search, clustering, and reranking.

Start Using API

API Performance

Latency: ~0.15s avg embedding time per 1K tokens on A100
Context: 512 token context (max sequence length)
Input: Free per 1M tokens (open-source model)
Output: $0.00 per 1M tokens (embeddings only, no generative output)
Uptime: 99% 99%

About the model

What is bge-base-en-v1.5?

bge-base-en-v1.5 is an English language embedding model developed by BAAI as part of its BGE general embedding series, transforming text into 768-dimensional vectors optimized for semantic similarity. It is mainly used for information retrieval and semantic search, where both queries and documents are embedded into a shared vector space for relevance ranking. It is also applied in downstream tasks such as clustering, reranking, and recommendation systems that rely on dense text representations. It belongs to the FlagEmbedding/BGE family alongside related variants like bge-small-en-v1.5 and bge-large-en-v1.5.

Input / Output

Input

Text sequences for embedding

Output

Dense vector embeddings

Model capabilities

5 Core Capabilities

Text Embeddings

Converts English sentences and passages into 768-dimensional dense vectors capturing semantic meaning for downstream similarity-based applications.
Semantic Search

Supports semantic search by embedding queries and documents into a shared space, enabling retrieval by meaning rather than exact keywords.
Sentence Similarity

Measures similarity between English texts by comparing their embeddings, useful for clustering, deduplication, and paraphrase detection pipelines.
Document Retrieval

Optimized for text retrieval tasks, ranking relevant passages or documents for a given query using vector similarity scores.
RAG Integration

Acts as the embedding backbone in retrieval-augmented generation systems, efficiently indexing and retrieving knowledge for larger language models.

Use cases

6 Most Valuable Use Cases

Semantic Document Search
Question Answer Retrieval
Text Clustering Analysis
RAG Knowledge Base
Recommendation Matching
Duplicate Ticket Detection

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding prices and best performance for bge-base-en-v1.5-class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~80ms	~8,000 tps	99.99%	$0.0100	$0.0100	8K tokens
BAAI	Global	~140ms	~4,000 tps	~99.9%	~$0.0130	~$0.0130	8K tokens
OpenAI	Global	~160ms	~3,000 tps	99.9%	~$0.0200	~$0.0200	8K tokens
Azure AI	US East	~170ms	~2,500 tps	99.9%	~$0.0220	~$0.0220	8K tokens
Replicate	Global	~190ms	~2,000 tps	~99.5%	~$0.0250	~$0.0250	8K tokens

Performance benchmarks

Technical Specifications

Metric	bge-base-en-v1.5 (BAAI)	all-MiniLM-L6-v2 (SBERT)	text-embedding-3-small (OpenAI)
Dimensions	768	384	1536
Max Input Tokens	~512	~256	8K
Price per 1M Tokens	~$0.05	~$0.00	~$0.02
Avg Latency per 1K Tokens	~80ms	~60ms	~90ms
Throughput	~2.5K tps	~3K tps	~2K tps
Uptime	~99.5%	~99.0%	~99.9%

30-day usage via LLM API

620M: Embedding tokens processed (30 days)
5.4M: API requests (30 days)
41.5K: Active developer accounts (30 days)
99.95%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal provider and model based on latency, cost, or performance policies—without changing your application code.
One endpoint, every model
Cost-Aware Orchestration

Control spend with per-route budgets, transparent usage metrics, and intelligent downshifting to cheaper models when quality thresholds are safely met.
Optimize spend by default
Resilient Fallback Flows

Define multi-provider fallbacks that auto-trigger on errors, timeouts, or degraded responses so your critical AI paths keep working in production.
No single point of failure
End-to-End Observability

Trace every request across models and providers with logs, metrics, and structured events to debug failures, tune prompts, and prove SLAs.
See every token hop
Task-Level Abstractions

Codify tasks like chat, generation, ranking, and tools once, then swap models or providers behind the scenes without touching business logic.
Code to tasks, not models
High-Throughput Batch APIs

Ship massive workloads through a single batch call with automatic chunking, retries, and concurrency control tuned for throughput and reliability.
Batch at production scale

Decision guide

When to Use — When NOT to Use

Use it if...

You need a strong English sentence-embedding model for general semantic similarity tasks.
You need inexpensive, fast vectorization for large-scale retrieval or RAG pipelines.
Your use case involves clustering or deduplicating many short English texts or titles.
Your use case involves building semantic search over FAQs, documentation, or support tickets.
You need a widely adopted open-source baseline embedding model with good community benchmarks.
Your use case involves re-ranking small candidate sets using cosine similarity of embeddings.

Avoid if...

You need multilingual embeddings beyond English, covering many languages with consistent performance.
Your workload requires domain-specialized embeddings, like biomedical or legal text understanding.
You need cross-modal embeddings aligning text with images, audio, or other modalities.
You need extremely high-dimensional, state-of-the-art embeddings for nuanced reasoning-heavy tasks.
Your workload requires very long-context document representation beyond what base models handle well.
You need supervised task-specific models, such as direct question answering or classification.

FAQ

Frequently Asked Questions

What is bge-base-en-v1.5?

bge-base-en-v1.5 is a 768-dimensional English text embedding model from BAAI optimized for retrieval, semantic search, and text similarity tasks.
What is bge-base-en-v1.5 best suited for when used via LLM.API?

It is best suited for building vector search, dense retrieval, reranking pipelines, semantic clustering, and recommendation systems on English text.
What context window should I assume when using bge-base-en-v1.5 for embeddings?

bge-base-en-v1.5 is typically used with inputs up to around 512 tokens, so you should chunk longer documents before embedding.
What modalities does bge-base-en-v1.5 support?

bge-base-en-v1.5 supports only text-to-vector embeddings and does not handle images, audio, or code execution.
How is bge-base-en-v1.5 priced on LLM.API?

Pricing is usage-based per embedded token and may differ from BAAI’s own deployment, so check the LLM.API pricing page for current rates.
What latency should I expect from bge-base-en-v1.5 on LLM.API?

You can generally expect low, sub-second latency for short texts, depending on request batch size and your network conditions.
How do I call bge-base-en-v1.5 through LLM.API?

Specify the model name "bge-base-en-v1.5" in the embeddings endpoint of LLM.API and pass your English text as input.
How does bge-base-en-v1.5 compare to larger BGE models?

Compared to larger BGE variants, it offers smaller embeddings and faster inference at the cost of slightly lower retrieval accuracy.
Can I use bge-base-en-v1.5 for multilingual text?

It is primarily trained for English, so performance on non-English text will generally be weaker than on English inputs.
What limitations should I be aware of when using bge-base-en-v1.5?

It does not generate text, may lose information on very long inputs, and its embeddings can reflect biases present in training data.

Start in 2 lines of code

Get My API Key

bge-base-en-v1.5

What is bge-base-en-v1.5?

5 Core Capabilities

Text Embeddings

Semantic Search

Sentence Similarity

Document Retrieval

RAG Integration

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code