Powered by Thenlper
GTE-Base
- Text Generation
GTE-Base by Thenlper is an English text embedding model that encodes sentences and paragraphs into 768-dimensional vectors for efficient semantic similarity and retrieval tasks. It is part of the General Text Embeddings (GTE) family trained with multi-stage contrastive learning.
About the model
What is GTE-Base?
GTE-Base is a BERT-based General Text Embeddings (GTE) model that maps English text into 768-dimensional dense vectors optimized for semantic representations. It is mainly used for semantic search and information retrieval, where it provides high-quality embeddings for matching queries with relevant documents. It is also widely applied to tasks such as clustering, reranking, and semantic textual similarity across diverse domains. GTE-Base belongs to the GTE model family (alongside GTE-Small and GTE-Large) introduced in the “Towards General Text Embeddings with Multi-stage Contrastive Learning” work.
Model capabilities
5 Core Capabilities
-
Text Embedding
Encodes English sentences and paragraphs into 768-dimensional dense vectors optimized for general-purpose semantic representation and downstream tasks.
-
Semantic Search
Supports efficient semantic search by embedding queries and documents into a shared space to retrieve meaningfully related results.
-
Sentence Similarity
Computes similarity between sentences or paragraphs using cosine distance in the embedding space for clustering and comparison.
-
Text Reranking
Improves ranking of candidate texts, leveraging relevance-focused embeddings to reorder search or retrieval results more accurately.
-
Classification Support
Provides embeddings suitable as input features for various text classification tasks across diverse domains and benchmarks.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- Document Clustering
- Text Reranking
- Duplicate Detection
- Recommendation Matching
- Sentence Similarity Scoring
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding prices and highest performance versus comparable GTE-Base providers.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 1,200 tps | 99.99% | $0.03 | $0.03 | 8K tokens |
| Thenlper (Original Hosting) | Global | ~250ms | ~300 tps | ~99.5% | ~$0.10 | ~$0.10 | ~4K tokens |
| OpenAI (text-embedding-3-small Equivalent) | Global | ~220ms | ~500 tps | 99.9% | ~$0.10 | ~$0.10 | 8192 tokens |
| Cohere (embed-english-light-v3) | US East | ~200ms | ~400 tps | 99.9% | ~$0.08 | ~$0.08 | 4096 tokens |
| Azure OpenAI (text-embedding-3-small) | Global | ~190ms | ~450 tps | 99.9% | ~$0.11 | ~$0.11 | 8192 tokens |
Performance benchmarks
Technical Specifications
| Metric | GTE-Base (Thenlper) | text-embedding-3-small (OpenAI) | bge-base-en-v1.5 (BAAI) |
|---|---|---|---|
| Dimensions | 768 | 1,536 | 768 |
| Max Input Tokens | ~8K | ~8K | ~8K |
| Price per 1M Tokens | ~$0.10 | $0.02 | ~$0.05 |
| Avg Latency | ~120ms | ~90ms | ~130ms |
| Throughput | ~1,500 tps | ~2,000 tps | ~1,200 tps |
| Uptime | ~99.5% | ~99.9% | ~99.0% |
30-day usage via LLM API
- 3.1B
- Embedding tokens processed (30 days)
- 9.4M
- API requests served (30 days)
- 410K
- Unique developer accounts (30 days)
- 99.8%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the best model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint. Any model. -
Cost-Aware Controls
Enforce per-model and per-project budgets with smart price-aware routing and guardrails so you never blow through spend while still meeting performance targets.
Predictable spend at scale. -
Resilient Fallbacks
Recover from provider outages or rate limits automatically with configurable fallback chains that keep your application online, responsive, and consistent under failure.
Never go dark again. -
Deep Observability
Get full visibility into every call—latency, cost, provider, model, and errors—with searchable traces and metrics that plug into your existing monitoring stack.
Trace every token. -
Task-Aware Orchestration
Define high-level tasks—chat, retrieval, tools, vision—and let LLM.API pick and orchestrate the right models and parameters for each use case.
Describe intent, not models. -
High-Throughput Batch
Process millions of requests efficiently with server-side batching, automatic chunking, and retry logic that slashes costs and squeezes maximum throughput from every provider.
Scale to millions safely.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a lightweight general-purpose text embedding model for semantic similarity tasks.
- You need to power semantic search over short to medium-length English text snippets.
- You need efficient embedding generation for clustering or topic modeling at scale.
- Your use case involves intent matching or FAQ retrieval with modest accuracy requirements.
- Your use case involves building recommendation features based on textual content similarity.
- You need a small, open-source embedding model that is easy to self-host.
Avoid if...
- You need state-of-the-art embedding performance across many languages and specialized domains.
- You need robust embeddings for very long documents or multi-page contexts.
- Your workload requires task-specific embeddings tuned for code or mathematical reasoning.
- You need production-grade support, SLAs, and monitoring from a major cloud provider.
- Your workload requires real-time personalization using extremely high-precision semantic representations.
- You need multimodal embeddings that jointly represent text with images, audio, or video.
FAQ
Frequently Asked Questions
-
What is GTE-Base?
GTE-Base is a sentence embedding model by Thenlper, optimized for generating dense text embeddings for retrieval, clustering, and semantic similarity tasks.
-
What is GTE-Base best suited for?
GTE-Base is best for semantic search, question answering over documents, duplicate detection, and other tasks requiring high-quality sentence or passage embeddings.
-
What modalities does GTE-Base support via LLM.API?
GTE-Base is text-only and supports embedding text inputs; it does not process images, audio, or video.
-
How does pricing for GTE-Base work on LLM.API?
On LLM.API, GTE-Base is billed per input token processed for embeddings; check your LLM.API pricing dashboard for the current rate.
-
What is the context window of GTE-Base on LLM.API?
GTE-Base typically supports input lengths around a few thousand tokens; very long documents should be chunked before embedding.
-
How is the latency and speed of GTE-Base through LLM.API?
GTE-Base is lightweight and generally returns embeddings with low latency, suitable for real-time or interactive semantic search applications.
-
How do I call GTE-Base through the LLM.API platform?
Use the LLM.API embeddings endpoint, specifying the provider as Thenlper and the model name as GTE-Base in your request parameters.
-
How does GTE-Base compare to larger embedding models?
GTE-Base is smaller and faster than many large embedding models, often with slightly lower accuracy but significantly lower cost and latency.
-
Can I use GTE-Base for code, tables, or other non-natural-language content?
GTE-Base is primarily trained for natural language text, so embeddings for code or highly structured data may be less accurate.
-
What are the main limitations of GTE-Base?
GTE-Base may underperform on very specialized domains, extremely long documents, or tasks requiring deep logical reasoning beyond surface semantic similarity.
