GTE-Base

Text Generation

GTE-Base by Thenlper is an English text embedding model that encodes sentences and paragraphs into 768-dimensional vectors for efficient semantic similarity and retrieval tasks. It is part of the General Text Embeddings (GTE) family trained with multi-stage contrastive learning.

Start Using API

API Performance

Latency: ~120ms avg embedding latency
Context: 512 max input tokens
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is GTE-Base?

GTE-Base is a BERT-based General Text Embeddings (GTE) model that maps English text into 768-dimensional dense vectors optimized for semantic representations. It is mainly used for semantic search and information retrieval, where it provides high-quality embeddings for matching queries with relevant documents. It is also widely applied to tasks such as clustering, reranking, and semantic textual similarity across diverse domains. GTE-Base belongs to the GTE model family (alongside GTE-Small and GTE-Large) introduced in the “Towards General Text Embeddings with Multi-stage Contrastive Learning” work.

Input / Output

Input

Text (English sentences, paragraphs; up to 512 tokens)

Output

Text embeddings (768‑dimensional numeric vectors)

Model capabilities

5 Core Capabilities

Text Embedding

Encodes English sentences and paragraphs into 768-dimensional dense vectors optimized for general-purpose semantic representation and downstream tasks.
Semantic Search

Supports efficient semantic search by embedding queries and documents into a shared space to retrieve meaningfully related results.
Sentence Similarity

Computes similarity between sentences or paragraphs using cosine distance in the embedding space for clustering and comparison.
Text Reranking

Improves ranking of candidate texts, leveraging relevance-focused embeddings to reorder search or retrieval results more accurately.
Classification Support

Provides embeddings suitable as input features for various text classification tasks across diverse domains and benchmarks.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Document Clustering
Text Reranking
Duplicate Detection
Recommendation Matching
Sentence Similarity Scoring

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding prices and highest performance versus comparable GTE-Base providers.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	1,200 tps	99.99%	$0.03	$0.03	8K tokens
Thenlper (Original Hosting)	Global	~250ms	~300 tps	~99.5%	~$0.10	~$0.10	~4K tokens
OpenAI (text-embedding-3-small Equivalent)	Global	~220ms	~500 tps	99.9%	~$0.10	~$0.10	8192 tokens
Cohere (embed-english-light-v3)	US East	~200ms	~400 tps	99.9%	~$0.08	~$0.08	4096 tokens
Azure OpenAI (text-embedding-3-small)	Global	~190ms	~450 tps	99.9%	~$0.11	~$0.11	8192 tokens

Performance benchmarks

Technical Specifications

Metric	GTE-Base (Thenlper)	text-embedding-3-small (OpenAI)	bge-base-en-v1.5 (BAAI)
Dimensions	768	1,536	768
Max Input Tokens	~8K	~8K	~8K
Price per 1M Tokens	~$0.10	$0.02	~$0.05
Avg Latency	~120ms	~90ms	~130ms
Throughput	~1,500 tps	~2,000 tps	~1,200 tps
Uptime	~99.5%	~99.9%	~99.0%

30-day usage via LLM API

3.1B: Embedding tokens processed (30 days)
9.4M: API requests served (30 days)
410K: Unique developer accounts (30 days)
99.8%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the best model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint. Any model.
Cost-Aware Controls

Enforce per-model and per-project budgets with smart price-aware routing and guardrails so you never blow through spend while still meeting performance targets.
Predictable spend at scale.
Resilient Fallbacks

Recover from provider outages or rate limits automatically with configurable fallback chains that keep your application online, responsive, and consistent under failure.
Never go dark again.
Deep Observability

Get full visibility into every call—latency, cost, provider, model, and errors—with searchable traces and metrics that plug into your existing monitoring stack.
Trace every token.
Task-Aware Orchestration

Define high-level tasks—chat, retrieval, tools, vision—and let LLM.API pick and orchestrate the right models and parameters for each use case.
Describe intent, not models.
High-Throughput Batch

Process millions of requests efficiently with server-side batching, automatic chunking, and retry logic that slashes costs and squeezes maximum throughput from every provider.
Scale to millions safely.

Decision guide

When to Use — When NOT to Use

Use it if...

You need a lightweight general-purpose text embedding model for semantic similarity tasks.
You need to power semantic search over short to medium-length English text snippets.
You need efficient embedding generation for clustering or topic modeling at scale.
Your use case involves intent matching or FAQ retrieval with modest accuracy requirements.
Your use case involves building recommendation features based on textual content similarity.
You need a small, open-source embedding model that is easy to self-host.

Avoid if...

You need state-of-the-art embedding performance across many languages and specialized domains.
You need robust embeddings for very long documents or multi-page contexts.
Your workload requires task-specific embeddings tuned for code or mathematical reasoning.
You need production-grade support, SLAs, and monitoring from a major cloud provider.
Your workload requires real-time personalization using extremely high-precision semantic representations.
You need multimodal embeddings that jointly represent text with images, audio, or video.

FAQ

Frequently Asked Questions

What is GTE-Base?

GTE-Base is a sentence embedding model by Thenlper, optimized for generating dense text embeddings for retrieval, clustering, and semantic similarity tasks.
What is GTE-Base best suited for?

GTE-Base is best for semantic search, question answering over documents, duplicate detection, and other tasks requiring high-quality sentence or passage embeddings.
What modalities does GTE-Base support via LLM.API?

GTE-Base is text-only and supports embedding text inputs; it does not process images, audio, or video.
How does pricing for GTE-Base work on LLM.API?

On LLM.API, GTE-Base is billed per input token processed for embeddings; check your LLM.API pricing dashboard for the current rate.
What is the context window of GTE-Base on LLM.API?

GTE-Base typically supports input lengths around a few thousand tokens; very long documents should be chunked before embedding.
How is the latency and speed of GTE-Base through LLM.API?

GTE-Base is lightweight and generally returns embeddings with low latency, suitable for real-time or interactive semantic search applications.
How do I call GTE-Base through the LLM.API platform?

Use the LLM.API embeddings endpoint, specifying the provider as Thenlper and the model name as GTE-Base in your request parameters.
How does GTE-Base compare to larger embedding models?

GTE-Base is smaller and faster than many large embedding models, often with slightly lower accuracy but significantly lower cost and latency.
Can I use GTE-Base for code, tables, or other non-natural-language content?

GTE-Base is primarily trained for natural language text, so embeddings for code or highly structured data may be less accurate.
What are the main limitations of GTE-Base?

GTE-Base may underperform on very specialized domains, extremely long documents, or tasks requiring deep logical reasoning beyond surface semantic similarity.

Start in 2 lines of code

Get My API Key

GTE-Base

What is GTE-Base?

5 Core Capabilities

Text Embedding

Semantic Search

Sentence Similarity

Text Reranking

Classification Support

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Controls

Resilient Fallbacks

Deep Observability

Task-Aware Orchestration

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code