Powered by Thenlper
GTE-Large
- Text Generation
GTE-Large is a general-purpose English text embedding model from Thenlper based on the General Text Embeddings (GTE) architecture. It produces 1,024-dimensional sentence embeddings optimized for semantic similarity and retrieval tasks.
About the model
What is GTE-Large?
GTE-Large is a BERT-based General Text Embeddings model released by Thenlper that generates 1,024-dimensional sentence and document embeddings for English text. It is mainly used for information retrieval and semantic search, where dense vector representations are required to match queries with relevant passages or documents. It is also applied to tasks such as semantic textual similarity, clustering, reranking, and various downstream applications evaluated on the MTEB benchmark. GTE-Large belongs to the GTE family of models introduced in the paper “Towards General Text Embeddings with Multi-stage Contrastive Learning,” alongside smaller variants like GTE-Base and GTE-Small.
Model capabilities
5 Core Capabilities
-
Text Embedding
Encodes English sentences, paragraphs, and moderate-length documents into dense 1024-dimensional vectors for downstream semantic tasks.
-
Semantic Similarity
Generates embeddings enabling accurate semantic textual similarity comparisons between sentence or document pairs using vector distance metrics.
-
Information Retrieval
Produces high-quality embeddings optimized for retrieval pipelines, improving search ranking and relevance over traditional lexical approaches.
-
Reranking Support
Provides rich semantic embeddings that can rerank candidate search or recommendation results for better ordering and relevance.
-
Clustering Usage
Offers consistent vector representations suitable for clustering texts into semantically coherent groups in analytics or discovery workflows.
Use cases
6 Most Valuable Use Cases
- Semantic Search
- Information Retrieval
- Semantic Reranking
- Text Clustering
- Text Similarity Scoring
- General Embedding Tasks
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding prices and best performance for GTE-Large–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | ~15K tokens/s | 99.99% | $0.03 per 1M tokens | $0.03 per 1M tokens | ~8K tokens |
| Thenlper (Direct) | Global | ~220ms | ~5K tokens/s | ~99.5% | ~$0.10 per 1M tokens | ~$0.10 per 1M tokens | ~8K tokens |
| OpenAI (text-embedding-3-large equivalent) | Global | ~200ms | ~10K tokens/s | 99.9% | $0.13 per 1M tokens | $0.13 per 1M tokens | ~8K tokens |
| AWS Bedrock (similar embedding model) | US East | ~250ms | ~8K tokens/s | 99.9% | ~$0.20 per 1M tokens | ~$0.20 per 1M tokens | ~8K tokens |
Performance benchmarks
Technical Specifications
| Metric | GTE-Large (Thenlper) | text-embedding-3-large (OpenAI) | bge-large-en-v1.5 (BAAI) |
|---|---|---|---|
| Dimensions | 1024 | 3072 | 1024 |
| Max Input Tokens | ~8K | 8K | ~8K |
| Price per 1M Tokens | ~$0.02 | $0.13 | ~$0.01 |
| Avg Latency | ~120ms | ~200ms | ~150ms |
| Throughput | ~1,500 tps | ~800 tps | ~1,200 tps |
| Uptime | ~99.5% | 99.9% | ~99.0% |
30-day usage via LLM API
- 1.9B
- Prompt tokens processed (30 days)
- 11.4M
- Embedding API requests (30 days)
- 420K
- Unique applications using GTE-Large (30 days)
- 99.95%
- Average API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the optimal provider and model based on latency, cost, or quality—without changing your integration or redeploying code.
One endpoint, every model -
Cost-Aware Orchestration
Control spend with smart tiering, per-route budgets, and provider mix policies that automatically balance price versus performance across all your AI workloads.
Cut cost, keep quality -
Resilient Fallbacks
Define multi-provider fallback chains so requests transparently fail over on errors, rate limits, or outages—keeping your AI features reliable in production.
Stay online, automatically -
End-to-End Observability
Get unified logs, metrics, and traces across providers with request-level insights into tokens, latency, errors, and model behavior in one place.
See every token -
Task-Level Abstractions
Call high-level tasks—chat, generation, embeddings, tools—instead of vendor-specific APIs, so you can swap models or providers without rewriting business logic.
Code to tasks, not vendors -
High-Throughput Batch
Run large-scale batch jobs with automatic chunking, retries, and concurrency control to fully utilize provider limits while keeping throughput predictable.
Scale jobs, not code
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a general-purpose text embedding model for semantic search or retrieval.
- You need multilingual sentence embeddings covering many languages in a single model.
- You need relatively lightweight embeddings that are cheaper than very large transformers.
- Your use case involves clustering or deduplicating large text corpora by semantic similarity.
- Your use case involves reranking search results using cosine similarity between query and documents.
- You need an open-source, locally deployable embedding model without proprietary dependencies.
Avoid if...
- You need state-of-the-art retrieval quality matching the newest large proprietary embedding models.
- Your workload requires embeddings for very long documents far beyond typical context limits.
- You need cross-modal embeddings that jointly handle text and images in one space.
- You need domain-specialized embeddings for code, biology, or legal texts out-of-the-box.
- Your workload requires strict latency guarantees on edge devices with extremely limited compute.
- You need embeddings that are continuously updated and versioned as a managed cloud service.
FAQ
Frequently Asked Questions
-
What is GTE-Large?
GTE-Large is a sentence embedding model by Thenlper optimized for high-quality text similarity, retrieval, and semantic search tasks.
-
What is GTE-Large best suited for?
GTE-Large is best for generating dense vector embeddings for search, clustering, recommendation, and RAG retrieval over large text corpora.
-
What modalities does GTE-Large support?
GTE-Large is a text-only model that accepts natural language input and outputs fixed-size vector embeddings.
-
How do I access GTE-Large through LLM.API?
You call the LLM.API embeddings endpoint with the GTE-Large model name, passing your text inputs and receiving embedding vectors in the response.
-
How does GTE-Large compare to similar embedding models?
GTE-Large typically offers strong semantic retrieval quality comparable to other large general-purpose embedding models, with competitive performance on common benchmark datasets.
-
What is the context window of GTE-Large?
GTE-Large is generally used on short to medium-length texts, and very long documents should be chunked before embedding.
-
How fast is GTE-Large and what latency should I expect via LLM.API?
Latency depends on LLM.API infrastructure and batch size, but GTE-Large is designed for practical real-time or near-real-time embedding workloads.
-
What does GTE-Large cost to use on LLM.API?
Pricing for GTE-Large is determined by LLM.API and is typically based on the number of tokens or characters embedded per request.
-
Does GTE-Large support batch embedding through LLM.API?
Yes, you can send multiple input texts in a single embeddings request to LLM.API to get batched GTE-Large embeddings.
-
What are the main limitations of GTE-Large?
GTE-Large cannot generate or understand images, may underperform on highly domain-specific jargon, and does not perform generative text completion.
